Node.jsBackendPerformance

Caching Strategies for Node.js APIs: Redis, In-Memory, and Edge

Five caching layers I use in production Node.js + Next.js APIs — in-memory LRU, Redis, edge CDN, stale-while-revalidate, and request coalescing. With when each one matters.

Hassan Javed

March 2026

10 min read

Cache or scale?

Most "we need to scale our backend" conversations end with: you needed cache.

A well-cached API serves 100x its uncached throughput from the same hardware. Knowing where to cache, what to cache, and how to invalidate it is a senior-level skill that most mid-level engineers haven't formalized.

Five layers I reach for, in order of latency win.

Layer 1: In-memory LRU cache

A Map inside your Node.js process. Cheapest, fastest possible cache. Survives only as long as the process.

Use for:

▸Config lookups, feature flags

▸Permission or role data that rarely changes

▸Computed values within a request (memoize)

Don't use for:

▸Anything that needs cross-process consistency (multiple Node workers = stale data)

▸Data over 100MB (RAM cost)

Library: lru-cache (the canonical one). 10 minutes to integrate.

Layer 2: Redis

The workhorse of API caching. Network-attached, shared across all your workers.

Use for:

▸Query result caching (DB to Redis to response)

▸Session data (when not using JWT)

▸Rate limit counters

▸Idempotency keys for webhooks

▸Anything that benefits from cross-worker consistency

Don't use for:

▸Persistent data (Redis is a cache, not a DB — even with persistence enabled)

▸Data over 500MB without sharding

Library: ioredis for the client. Hosted: Upstash (serverless-friendly, generous free tier), Redis Cloud (more mature), or self-hosted on Railway.

Layer 3: Edge or CDN caching

Cache responses at the CDN layer (Cloudflare, Vercel's edge cache, CloudFront). Highest latency win — response served from the user's geographic region, never hitting your origin.

Use for:

▸Public, idempotent GET endpoints

▸Anything safely cacheable for over 30 seconds

▸Static assets (obvious)

▸API responses for unauthenticated users

Don't use for:

▸User-specific data (different cache key per user defeats the purpose at the edge)

▸Data that changes faster than your TTL

Configure via Cache-Control headers (s-maxage, stale-while-revalidate). Vercel and Cloudflare both respect these.

Layer 4: Stale-while-revalidate (SWR)

A pattern, not a layer. When data is stale, serve the stale version immediately AND kick off a background refresh.

Use for:

▸Data where "a few seconds stale" is acceptable

▸Lists, feed data, dashboards

▸Almost any read-heavy GET endpoint

Don't use for:

▸Banking, inventory, anything where staleness causes correctness bugs

Implementation: Next.js revalidate config does this natively. Manual: serve cache, dispatch refresh fire-and-forget.

Layer 5: Request coalescing

When 100 concurrent requests ask for the same uncached data, naively you fire 100 DB queries. Coalescing means you fire one query, and 99 requests wait for its result.

Use for:

▸High-fanout endpoints (homepage stats, popular items)

▸Anything where you'd otherwise stampede the DB on cache miss

Implementation: wrap your fetch function with a "dedupe map" — same key, same in-flight promise. promise-memoize or hand-rolled in 20 lines.

This is the difference between a graceful cache miss and a thundering herd that takes down your DB.

A real-world stacking example

A SaaS dashboard endpoint that returns the user's recent activity:

1.Edge cache: No (user-specific)

2.Redis cache: Yes, key is activity colon userId, TTL 30s

3.In-memory cache: Yes, same key, TTL 5s (cuts Redis network roundtrip for repeat requests in same second)

4.SWR: Yes, serve stale Redis data immediately when missed, refresh in background

5.Request coalescing: Yes, on cache miss, only one worker queries the DB

Result: from around 80ms per request (cache miss) to around 3ms (in-memory hit), with the DB seeing maybe 1 query per 30 seconds per user.

Invalidation: the hard part

Caching is easy. Invalidating correctly is hard.

TTL-based (lazy)

The simplest. Cache expires after N seconds. Acceptable staleness in exchange for zero invalidation logic. Use when staleness up to TTL is acceptable.

Event-driven (eager)

When data mutates, you publish an invalidation event. Listeners flush related cache keys. Use when you need consistency within seconds of a write.

Cache tags (Next.js)

Next.js's revalidateTag is the cleanest pattern I've seen. Tag a fetch with a label, later call revalidateTag(label) to invalidate everything with that tag. Best invalidation primitive I've used.

What I don't cache

Things I see over-cached:

▸Authentication-decoded JWT. It's already a stateless token; decoding it is microseconds.

▸Already-fast queries. A 5ms DB query doesn't need a cache layer. The complexity isn't worth it.

▸One-off compute. If a value is computed once at startup, just hold it in module scope.

TL;DR

▸5 layers: in-memory LRU, Redis, edge CDN, SWR, request coalescing

▸Stack them. A real API uses 3-4 layers simultaneously.

▸Cache hot reads; don't cache cheap operations.

▸Invalidation is harder than caching. Pick a strategy on day one.

If your Node.js or Next.js API is hitting scaling limits and you want a senior to architect the caching strategy, contact me.

Background Jobs in Node.js 2026: BullMQ, Trigger.dev, or Inngest?

Compared on real client projects: BullMQ vs Trigger.dev vs Inngest for Node.js background jobs. What I pick for what, with cost, DX, and operational trade-offs.

10 min readRead

Node.jsBackend

Building a Production REST API with Node.js and Express in 2026

Layered architecture, validation, error handling, auth, rate limiting, observability — the patterns I use to ship Node.js + Express APIs that don't fall over in production.

12 min readRead

AIBackend

Building Production AI Agents with Claude 4.7 and Tool Use

What I learned shipping AI agents to production: tool design, prompt structure, durable execution, observability, and cost control. Practical patterns from real client work.

11 min readRead

Back to all articles Follow on LinkedIn

Cache or scale?

Layer 1: In-memory LRU cache

Layer 2: Redis

Layer 3: Edge or CDN caching

Layer 4: Stale-while-revalidate (SWR)

Layer 5: Request coalescing

A real-world stacking example

Invalidation: the hard part

TTL-based (lazy)

Event-driven (eager)

Cache tags (Next.js)

What I don't cache

TL;DR

You might also like

Background Jobs in Node.js 2026: BullMQ, Trigger.dev, or Inngest?

Building a Production REST API with Node.js and Express in 2026

Building Production AI Agents with Claude 4.7 and Tool Use