AI Pipeline Tool

Response Cache

Cache AI prompt responses to eliminate redundant API calls. Save time, reduce costs, and speed up your development workflow with exact and near-match caching.

Install

Get the CLI Tool

Run as an MCP server to cache AI responses locally, or try the demo below.

npx @clinetools/response-cache

Requires Node.js 18+

Exact match caching via SHA-256 prompt hashing for instant lookups
Near-match detection with >90% similarity threshold after normalization
TTL (time-to-live) support — default 1 hour, configurable per entry
LRU eviction when cache exceeds 1,000 entries — oldest and least-used removed first
Namespace isolation — separate caches for code-review, bug-fix, explanation, etc.

Tutorial

How to Use It

Three ways to start caching AI responses — pick the one that fits your workflow.

Try Online

Store prompts and look them up in the interactive demo below — no install required.

↓ Jump to interactive demo

Use via CLI

Run as a local MCP server. Cache is stored in .cache/response-cache.json by default.

npx @clinetools/response-cache

Add to Cline / Claude Code

Add to your MCP settings so your agent caches and retrieves responses automatically.

"response-cache": { "command": "npx", "args": ["@clinetools/response-cache"] }

MCP Client Configuration

{
  "mcpServers": {
    "response-cache": {
      "command": "npx",
      "args": ["@clinetools/response-cache"]
    }
  }
}

Store and Lookup Example

// Store a response
cache_store({
  prompt: "Review this Python function for bugs",
  response: "The function has a potential null reference on line 12...",
  ttl: 7200,
  namespace: "code-review"
})

// Look it up later
cache_lookup({
  prompt: "Review this Python function for bugs",
  namespace: "code-review"
})
// => { hit: true, similarity: 1.0, entry: { response: "..." } }

Cache Management

// View statistics
cache_stats({ namespace: "code-review" })
// => { totalEntries: 42, hitRate: "73.2%", sizeBytes: 51200 }

// Clear old entries (older than 24 hours)
cache_clear({ olderThan: 86400 })
// => { cleared: 15, remaining: 27 }

// Clear a specific namespace
cache_clear({ namespace: "bug-fix" })
// => { cleared: 8, remaining: 34 }

Live Demo

Try It Online

Store prompts, look them up, and explore how caching works.

Cache Operations

Store a response, then look it up to see cache hits and near-matches

Load example:

Prompt

Response (for storing)

TTL (seconds)

Namespace

Store a prompt/response pair, then look it up to see caching in action.

Details

Learn

Why Response Caching Matters

Identical prompts to AI APIs waste time and money. Caching solves both.

Semantic Caching

Exact match uses SHA-256 hashing for O(1) lookup speed. Near-match normalizes prompts (lowercase, strip whitespace, remove punctuation) and compares character overlap — prompts with >90% similarity return cached responses.

Cost Savings

Every cache hit eliminates an API call. With GPT-4 at $30/1M input tokens, a 50% hit rate on 10,000 daily prompts saves $150/day. Cache hits are instant, so you also save 1-5 seconds of latency per request.

Cache Invalidation

TTL ensures stale responses are automatically purged. Set short TTLs (minutes) for fast-changing contexts and long TTLs (hours/days) for stable references. Clear by namespace to invalidate specific categories without losing everything.

Namespace Isolation

Separate caches for different prompt types: code-review, bug-fix, explanation, translation. Each namespace has independent stats and can be cleared independently. Prevents cross-contamination between different AI tasks.

Cache Every Response

Add the Response Cache to your agent's toolkit and stop paying for the same answer twice.

View Plans