Testing Tool

Performance Benchmark

Micro-benchmark code snippets with precise timing. Compare implementations side by side and find the fastest approach with statistical confidence.

Get the CLI Tool

Run the performance benchmark locally as an MCP server, or try it online below.

npx @clinetools/benchmark
Requires Node.js 18+
  • Precise timing with performance.now() — sub-millisecond accuracy
  • Automatic warm-up runs to eliminate JIT compilation noise
  • Statistical analysis: std deviation, margin of error, confidence intervals
  • Side-by-side comparison of two implementations with speedup ratio
  • Configurable iteration count — from quick checks to thorough analysis

How to Use It

Three ways to benchmark your code — pick the one that fits your workflow.

1

Try Online

Paste two implementations below to compare their performance — no install needed.

2

Use via CLI

Run as a local MCP server. Your AI agent can benchmark code during development.

npx @clinetools/benchmark
3

Add to Cline / Claude Code

Add to your MCP settings so your agent benchmarks code automatically.

"benchmark": { "command": "npx", "args": ["@clinetools/benchmark"] }

MCP Client Configuration

{
  "mcpServers": {
    "benchmark": {
      "command": "npx",
      "args": ["@clinetools/benchmark"]
    }
  }
}

Example: Compare Implementations

// Prompt to your AI agent:
"Benchmark these two sorting approaches and
tell me which is faster"

// The agent calls:
run_benchmark({
  code: "const arr = [5,3,1,4,2]; arr.sort((a,b) => a-b);",
  compareWith: "const arr = [5,3,1,4,2]; /* manual bubble sort */",
  label: "Array.sort()",
  compareLabel: "Bubble Sort",
  iterations: 5000
})

// Output shows ops/sec, avg time, min/max,
// margin of error, and which is faster

Preference Conversation

// On first run, the tool asks:

1. "How many iterations by default?"
   [ ] 100  - Quick estimate
   [x] 1000 - Balanced accuracy
   [ ] 5000 - High precision

2. "Include warm-up phase?"
   [x] Yes - skip first 10% of runs (recommended)
   [ ] No  - measure all iterations

3. "Output format preference?"
   [x] Full stats (ops/sec, avg, min, max, margin)
   [ ] Summary only (ops/sec + comparison)
   [ ] JSON (machine-readable)

// Preferences saved to .clinetools/benchmark.json
// Remembered for all future runs
Live Demo

Try It Online

Enter two implementations to benchmark and compare their performance.

Benchmark Code Snippets

Enter two implementations to compare, or just one to benchmark solo

Try a demo:
Implementation A
Implementation B

Enter code and click Run Benchmark to measure performance.

Why Benchmarking Matters

Understanding performance helps you make informed decisions about code tradeoffs.

Micro vs Macro Benchmarks

Micro-benchmarks measure isolated operations (array sort, string concat). Macro-benchmarks measure entire workflows. Both matter, but micro-benchmarks reveal which specific patterns are faster. Use them to validate assumptions about JS engine optimizations.

Statistical Significance

A single run means nothing. Margin of error and standard deviation tell you whether a 5% speedup is real or just noise. Always check the confidence interval — if the margin of error is larger than the difference, the results are inconclusive.

Warm-up Runs

JavaScript engines use JIT compilation — code gets faster after repeated execution. The first few runs are slower while the engine optimizes. Warm-up runs (10% of iterations) let the JIT stabilize before measurement begins, giving more accurate results.

Environment Factors

CPU throttling, background processes, garbage collection, and browser tab focus all affect results. Run benchmarks multiple times and close other applications. In-browser benchmarks give relative comparisons; for absolute numbers, use the CLI tool in a controlled environment.

Benchmark Every Change

Add the Performance Benchmark to your agent's toolkit and catch regressions before they ship.

View Plans