Have you used Redis for more than simple caching?
Cache miss became a slow API call, p99 climbed, LLM cost climbed. That's where I discovered Redis Stack as a deterministic retrieval and analytics layer for LLM applications.
Until recently, that's exactly what Redis was for me: GET/SET, TTL, expiration, and life moves on.
Then I started using it as a persistent cache for the embedding matrices of a semantic router (via RedisVectorStore), keyed by fingerprint + intent, with compressed NPZ. Then I added Redis Checkpointer for LangGraph, persisting state across runs and threads. It worked well… but it was just the beginning of my approach to Redis.
When a conversational assistant (with LLM) starts depending on heavy APIs and large stores (product catalogs, preference history, events…), the bill shows up fast:
cache miss → slow API call → p99 climbs → UX suffers → LLM cost climbs = bad user experience.
That's where I discovered Redis Stack as a deterministic retrieval and analytics layer for LLM applications. What surprised me most was seeing it deliver, for free (or close to it), things I used to think required a separate vector DB or Elasticsearch:
- native fuzzy search (tolerating "teniz" → "tênis", "cdbi ipca" → "CDB IPCA");
- vector (embedding) search for semantic similarity;
- powerful structured filters (tags, numeric ranges, full-text);
- real-time analytics (metrics, events, time series).
All running inside the same Redis, without standing up a parallel infrastructure.
Where this can fit
- e-commerce: similar products + price range + available stock;
- customer support: finding relevant tickets or FAQs even with typos;
- logistics: detecting delay spikes and correlating with routes/events in real time;
- media/content: hybrid recommendation (semantic + tags + recency).
But there's no free lunch
The trade-offs show up fast:
- RAM and indexes are expensive (especially when you load vectors + heavy indexing);
- tuning and NoSQL modeling require discipline (fat, slim, hybrid - each has its price);
- ingestion and updates can become a bottleneck if volume is high and indexing isn't designed well.
Even so, when the core pain is latency + LLM cost + excessive model dependency, it's worth analyzing case by case. In my case, it completely changed the game.
Next week I'll talk about something that paired very well with Redis Stack: data modeling Fat vs Slim vs Hybrid - how each model affects cost, performance, and flexibility (and which one I ended up using to solve my problem).
Were you already familiar with Redis Stack (RedisJSON + RediSearch + Vector)? How did you use it and what was the biggest trade-off you faced?