Context-maxxing as a service

Context goes in dirty.
It comes out clean.

A model-agnostic API that transforms raw conversation history into an optimized, compressed messages array — ready for your next LLM call.

One call. Smaller context.

Pass your messages array. Get back a distilled version. Drop it straight into your next LLM call.

POST /distill
// Before: 4,200 tokens
const result = await fetch('https://contextspa.com/api/distill', {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${API_KEY}` },
  body: JSON.stringify({
    messages: conversationHistory,
    strategy_id: 'technical_dense',
  }),
});

// After: 610 tokens (85% reduction)
const { messages, metadata } = await result.json();

Everything your LLM doesn't need to see,
gone.

Your conversation history accumulates noise: pleasantries, restatements, abandoned threads. contextspa strips that out and hands back only the signal.

⚗️

Stateless transformation

Messages in, messages out. No storage. No memory. No side effects. Your data doesn't live here.

🎛️

Named strategies

Choose how your context is distilled. Technical dense, aggressive summarize, decision extraction — or write your own.

🔌

Model-agnostic

Works with any downstream model. Feed the output to GPT-4, Claude, Gemini, or your local model — doesn't matter.

📐

Inject overrides

Append a short instruction to any strategy at call time. "Preserve all mentions of variable authToken" — and it will.


Three strategies, shipped on day one.

Pick the right distillation mode for your use case. Full schema and author guide at docs → strategies.

summarize

Aggressive Summarize

Reduces conversation to essential outcomes. Preserves conclusions, decisions, and open questions. Everything else: summarized to one sentence.

compression: 8–15% provider: gemini-flash
technical_dense

Technical Dense

Preserves all code blocks, error messages, variable names, and architectural decisions verbatim. Strips conversational filler aggressively.

compression: 10–20% provider: gemini-flash
extract_decisions

Extract Decisions

Extracts only explicit decisions, commitments, and their rationale. Everything else is dropped. Best before a planning or review session.

compression: 5–10% provider: gemini-flash

Community strategies coming soon. See docs for the strategy schema and author guide.


Drop it into your existing pipeline.

contextspa sits between your conversation history and your next LLM call. No agent rewiring. No new memory layer.

Call /distill

POST your messages array with a strategy ID and your API key. Optionally append a one-off inject instruction.

Strategy executes

The strategy's system prompt runs against your messages using Gemini Flash. Output is a clean messages array.

Use the output

Drop the returned messages array directly into your next call. Same format as the input — no adapter needed.

Pay per token

Deposit credits, use them. You'll typically spend $0.01–$0.05 to save 80–95% of your downstream context cost.


Start distilling.

Get an API key, call /distill, see compression in under 10 seconds.