Response Cache
Cache AI prompt responses to eliminate redundant API calls. Save time, reduce costs, and speed up your development workflow with exact and near-match caching.
Get the CLI Tool
Run as an MCP server to cache AI responses locally, or try the demo below.
npx @clinetools/response-cache- Exact match caching via SHA-256 prompt hashing for instant lookups
- Near-match detection with >90% similarity threshold after normalization
- TTL (time-to-live) support — default 1 hour, configurable per entry
- LRU eviction when cache exceeds 1,000 entries — oldest and least-used removed first
- Namespace isolation — separate caches for code-review, bug-fix, explanation, etc.
How to Use It
Three ways to start caching AI responses — pick the one that fits your workflow.
Try Online
Store prompts and look them up in the interactive demo below — no install required.
Use via CLI
Run as a local MCP server. Cache is stored in .cache/response-cache.json by default.
Add to Cline / Claude Code
Add to your MCP settings so your agent caches and retrieves responses automatically.
MCP Client Configuration
{
"mcpServers": {
"response-cache": {
"command": "npx",
"args": ["@clinetools/response-cache"]
}
}
}Store and Lookup Example
// Store a response
cache_store({
prompt: "Review this Python function for bugs",
response: "The function has a potential null reference on line 12...",
ttl: 7200,
namespace: "code-review"
})
// Look it up later
cache_lookup({
prompt: "Review this Python function for bugs",
namespace: "code-review"
})
// => { hit: true, similarity: 1.0, entry: { response: "..." } }Cache Management
// View statistics
cache_stats({ namespace: "code-review" })
// => { totalEntries: 42, hitRate: "73.2%", sizeBytes: 51200 }
// Clear old entries (older than 24 hours)
cache_clear({ olderThan: 86400 })
// => { cleared: 15, remaining: 27 }
// Clear a specific namespace
cache_clear({ namespace: "bug-fix" })
// => { cleared: 8, remaining: 34 }Try It Online
Store prompts, look them up, and explore how caching works.
Cache Operations
Store a response, then look it up to see cache hits and near-matches
Store a prompt/response pair, then look it up to see caching in action.
Details
Why Response Caching Matters
Identical prompts to AI APIs waste time and money. Caching solves both.
Semantic Caching
Exact match uses SHA-256 hashing for O(1) lookup speed. Near-match normalizes prompts (lowercase, strip whitespace, remove punctuation) and compares character overlap — prompts with >90% similarity return cached responses.
Cost Savings
Every cache hit eliminates an API call. With GPT-4 at $30/1M input tokens, a 50% hit rate on 10,000 daily prompts saves $150/day. Cache hits are instant, so you also save 1-5 seconds of latency per request.
Cache Invalidation
TTL ensures stale responses are automatically purged. Set short TTLs (minutes) for fast-changing contexts and long TTLs (hours/days) for stable references. Clear by namespace to invalidate specific categories without losing everything.
Namespace Isolation
Separate caches for different prompt types: code-review, bug-fix, explanation, translation. Each namespace has independent stats and can be cleared independently. Prevents cross-contamination between different AI tasks.
Cache Every Response
Add the Response Cache to your agent's toolkit and stop paying for the same answer twice.
View Plans