Zero-Downtime Postgres Migrations

March 3, 2026 · Databases

Downtime during migrations is usually a process failure, not a database limitation.

Most incidents happen when schema and application deployments are tightly coupled. A safe migration strategy separates those concerns.

Use Expand and Contract

Every risky migration should follow a two-phase path:

1) Expand: add nullable columns, new tables, new indexes concurrently
2) Dual-write: old + new schema paths in application
3) Backfill: migrate historical rows in batches
4) Read-switch: read from new schema
5) Contract: remove old paths and columns later

This sequence keeps both app versions compatible throughout rollout.

Never Block Writes with Big Locks

Operations like ALTER TABLE ... SET NOT NULL and large index builds can block traffic. Use safer alternatives:

CREATE INDEX CONCURRENTLY idx_events_tenant_created
  ON events (tenant_id, created_at);

ALTER TABLE events
  ADD CONSTRAINT events_status_valid
  CHECK (status IN ('pending','done','failed')) NOT VALID;

ALTER TABLE events VALIDATE CONSTRAINT events_status_valid;

Validate constraints after deployment, not in the critical path.

Backfill in Controlled Batches

Backfills should be resumable and rate-limited. We run chunked jobs with checkpoints and adaptive sleep based on replica lag.

batch_size = 2000
while rows_remaining:
  migrate_next_batch(batch_size)
  if replica_lag_ms > 3000:
    sleep(2)

This avoids saturating IO and protects read replicas.

Migration Rollbacks Need Their Own Plan

Rollback is not always "run down migration." If data has already moved, down migrations can be destructive. We use forward-fix rollbacks: keep schema expanded, disable new read paths, and redeploy.

Schema contraction happens only after confidence windows close.

Operational Checklist

- migration rehearsed in staging with production-like volume
- lock timeout set (never infinite)
- query plans captured before and after
- canary deploy enabled
- alerting on write latency + error rate
- kill switch for dual-write path

When this checklist is standard, migrations stop being scary. They become routine engineering work.

← Back to Home