Cache Invalidation for Real-Time APIs
Real-time systems do not remove the need for caches. They remove your excuses for bad invalidation.
Teams often say they need "live data" and treat that as a reason to bypass caching entirely. The result is usually higher cost, noisier dependencies, and worse tail latency. Freshness matters, but it still has to be engineered.
Freshness Is a Product Requirement
The first question is not "Redis or CDN?" It is "how stale is acceptable for this surface?"
- Portfolio balances may need sub-second freshness.
- Activity feeds may tolerate a few seconds.
- Aggregate dashboards may tolerate a minute if the UI says so.
Once freshness is explicit, cache policy becomes an engineering problem instead of an argument.
Invalidate by Dependency, Not by Guessing
Good invalidation starts with key design. A cached object should be traceable to the entities and events that can make it wrong.
cache key:
feed:user:4821:v17
dependencies:
- follow graph for user 4821
- posts by followed authors
- mute/block settings
If you cannot name the dependencies, you cannot invalidate correctly.
Fan-Out Needs a Repair Path
Event-driven invalidation is never perfect. Messages arrive late. Consumers crash. One region lags behind another. Real-time systems need a repair loop in addition to push invalidation.
- Short TTLs on critical paths.
- Background reconciliation for hot keys.
- Version checks when serving cache hits.
Do Not Over-Broadcast
Many systems solve invalidation uncertainty by purging whole namespaces. That works until traffic spikes.
Broad invalidation turns one write into a read storm. The database pays for your lack of precision.
Stale-While-Revalidate Is a UX Tool
For non-critical surfaces, stale-while-revalidate keeps the interface fast while background refresh repairs freshness. That is often better than blocking every request on the newest possible value.
serve cached response if age < 5s
revalidate async if age > 2s
force origin read if critical bit changed
Measure Correctness, Not Just Hit Rate
A 98% cache hit rate can hide a broken system. You also need freshness metrics:
- cache hit age by endpoint
- event-to-invalidation delay
- stale serve rate on critical entities
- repair success rate after missed invalidation
That is how you know whether the API is both fast and right.
← Back to Home