Observation Masking

Stop paying to resend file reads, build logs, and git diffs that your agent will never reference again.

Your context window is full of stale output

The Problem

In a multi-turn coding session, every request resends the entire conversation history. By turn 20, you're paying for 15,000 tokens of old file reads, build logs, test results, and git diffs. The agent won't look at any of it again, but you're billed for every token on every subsequent turn.

The Solution

Observation Masking detects tool outputs from earlier turns and replaces them with a one-line placeholder before the request is sent. The most recent 2 messages stay intact. Research from JetBrains shows this outperforms LLM-based summarization because agents re-read files when they need to instead of trusting a lossy summary.

Before and After

Turn 20 Without Masking

Total context: ~18,000 tokens Turn 3: read_file output (800 tokens) Turn 5: npm test output (1,200 tokens) Turn 8: git diff output (600 tokens) Turn 12: build log (900 tokens) ... all resent on every turn

Turn 20 With Masking

Total context: ~6,000 tokens Turn 3: [output omitted — 800 tokens saved] Turn 5: [output omitted — 1,200 tokens saved] Turn 8: [output omitted — 600 tokens saved] Turn 12: [output omitted — 900 tokens saved] ... only last 2 messages preserved in full

Token savings: ~50%

How It Works

A request arrives with a multi-turn conversation history

The proxy identifies tool outputs from earlier turns (file reads, build logs, test results, git diffs)

Large outputs are replaced with compact placeholders preserving metadata

Recent messages (last 2) are always left intact. The optimized request forwards to the provider

Frequently Asked Questions

Won't the model lose important context?

No. The most recent messages are always preserved. For older context, the model re-reads the file if it needs it. JetBrains research shows this actually improves agent performance.

What types of output are masked?

Tool results (file contents, command output), large code blocks in assistant responses, build/test logs, stack traces, JSON blobs, and directory listings.

What's the minimum size for masking?

Outputs under 500 characters are left intact. Only large outputs that contribute to context bloat are replaced.

Does masking work with the SDK?

Yes. Masking runs server-side at the proxy, so it works with both the proxy URL and SDK integration paths.

Related Tools

Ready to start saving?

Create an account, add your first app, and swap one URL. Takes about 5 minutes.

Get Started Free