Database Performance at 10 Million QPS

February 20, 2026 · Performance

Last Tuesday at 3:47 AM, our database hit 10 million queries per second. Not a spike. Not a benchmark. Sustained production traffic. And latency stayed under 2 milliseconds.

This wasn't luck. It was the result of eighteen months of systematic optimization. Most of which we got wrong the first time.

The Connection Pool Trap

Everyone knows about connection pooling. Few people understand it. Our first attempt at scale: 500 application servers, each with 20 connections. That's 10,000 connections to a single database.

The database didn't crash. It just got slow. Context switches became the bottleneck. The kernel spent more time scheduling than processing queries.

The fix: a two-tier pool. Application servers connect to PgBouncer. PgBouncer maintains a smaller pool to PostgreSQL. The result?

Application Servers (500)
    ↓
PgBouncer Pool (100 connections total)
    ↓
PostgreSQL (max_connections = 200)

Total active connections dropped from 10,000 to 200. CPU usage fell 60%.

Query Planning Is Everything

We found queries that took 50ms with the wrong plan and 0.1ms with the right one. The optimizer was guessing wrong because statistics were stale.

Our solution combines three approaches:

Partial indexes for hot paths. Instead of indexing the entire table, index only the data that matters for your most common queries.

CREATE INDEX idx_active_users_recent 
ON users(last_login) 
WHERE status = 'active' AND created_at > '2025-01-01';

This index is 5% the size of a full index. It fits in RAM. Queries that use it are 20x faster.

Prepared statements with plan hints. For critical queries, we force specific plans. Controversial, but necessary when the optimizer consistently gets it wrong.

Continuous statistics refresh. We run ANALYZE on frequently-changing tables every hour, not the default every day. The overhead is negligible. The benefit is massive.

The Caching Strategy

At this scale, caching isn't optional. But cache invalidation is hard. We use a hybrid approach:

Layer 1: Application cache (Redis) - 1 second TTL
Layer 2: CDN edge cache - 60 second TTL
Layer 3: Database result cache - 5 minute TTL

The key insight: different data has different freshness requirements. User profiles can be stale for minutes. Real-time notifications cannot.

We tag cache entries by data type. Invalidation becomes targeted. When a user updates their profile, we invalidate only that user's cache entries across all layers. Not the entire cache.

Read Replicas Done Right

We run 15 read replicas. But the challenge isn't replication lag—it's query routing. Applications need to know which queries can go to replicas and which need the primary.

Our middleware inspects each query:

SELECT → Replica (if no FOR UPDATE)
INSERT/UPDATE/DELETE → Primary
SELECT FOR UPDATE → Primary

We also track replication lag per replica. If lag exceeds 100ms, that replica is temporarily removed from rotation. Clients retry on another replica.

Monitoring That Matters

Traditional database monitoring tells you CPU, memory, and disk. Useful, but not actionable. We track:

- Query latency by percentile (p50, p99, p99.9)
- Lock wait time
- Checkpoint frequency
- Vacuum progress
- Cache hit ratio by table

These metrics predict problems before they become outages. Last month, rising lock wait times alerted us to a missing index before users noticed slowdowns.

The Real Lesson

Performance isn't a feature you add later. It's a constraint you design around from day one. Every schema decision, every query pattern, every caching strategy compounds.

We started with a database that struggled at 1,000 QPS. Now we handle 10,000x that. The hardware didn't change. The thinking did.

← Back to Home