The Real Cost of Context Window Churn

If you use AI coding assistants for real work, you know the feeling. You've been working on a complex feature for an hour. Claude understands your codebase, your decisions, your constraints. Then the context window fills up. And you start over.

The Hidden Tax

Context window churn isn't just annoying. It's expensive in three ways:

Time - Re-explaining your project state, decisions, and constraints. Often 5-10 minutes per reset.
Tokens - Paying for the same context multiple times. If you lose context mid-task, you're re-sending what the model already knew.
Quality - The AI loses nuance. Decisions made earlier in the session that informed current work are forgotten.

The Real Numbers

In a typical 2-hour Claude Code session, we measured:

• 3-5 context resets
• 20-30 minutes lost to re-explaining context
• 15-25% of total tokens spent on redundant context

Traditional Approaches Don't Work

Most solutions to context management have significant drawbacks:

LLM Compaction

Claude Code's built-in compaction uses the model itself to summarize context. It's better than nothing, but:

Takes 30-60 seconds (an eternity when you're in flow)
Lossy - the model decides what's "important," not you
Still uses tokens for the summarization step

Manual Context Files

Some developers maintain README files or scratch pads with context. This helps, but:

Requires manual maintenance
Falls out of sync with actual state
Doesn't capture the nuanced back-and-forth that built understanding

Just Start a New Session

The "solution" many developers use by default. But you lose everything the AI learned about your project, your preferences, and your current task state.

A Different Approach: Snapshots

We built momentum to solve this differently. Instead of trying to compress context, we save it at task boundaries and restore it instantly after /clear.

# Traditional flow
[work] → [context full] → [compaction: 30-60s] → [continue]

# momentum flow
[work] → [save snapshot] → [/clear] → [restore: <5ms] → [continue]

The key insight: SQLite reads are instant. We don't need to re-process or re-summarize anything. Just read the snapshot and inject it into the new context.

Restore Speed

We measured restore times across different snapshot sizes:

Stored Tokens	Restore Time	vs LLM Compaction
10,000	~1ms	~26,000x faster
50,000	~2.5ms	~12,000x faster
100,000	~4ms	~7,000x faster
150,000	~5ms	~6,000x faster

Benchmarks on M1 MacBook Pro using Bun's native SQLite. Your results may vary but will be in the same ballpark.

What Gets Saved

A momentum snapshot captures:

Summary - What you were working on
Key files - Files relevant to the current task
Decisions made - Technical choices and their rationale
Blockers - What was stopping progress
Code state - Important variables, configurations
Recent messages - The last few exchanges for context

The Workflow

momentum integrates as a Claude Code plugin with these tools:

Work normally - Claude saves snapshots automatically at task boundaries (configurable)
Context fills up - You notice things getting slow or hit Claude's limit
Run /clear - Clears the context window
Claude calls restore_context - Latest snapshot is loaded instantly
Continue where you left off - No re-explanation needed

Why Not Just Bigger Context Windows?

Context windows are getting larger. Claude supports 200K tokens. GPT-4 has 128K. Why not just use more context?

Three reasons:

Cost - Larger context means more tokens, means more money. At scale, this matters.
Latency - Larger context windows are slower. The model has to process all that context for every response.
Attention degradation - Studies show models perform worse with very long contexts. Important information in the "middle" gets less attention.

Smart context management isn't about stuffing more tokens into the window. It's about having the right context available when you need it.

Try It

momentum is free and open source. Install it in Claude Code:

/plugin install momentum@substratia-marketplace

Requires Bun runtime. If you don't have it:

curl -fsSL https://bun.sh/install | bash

The Ecosystem

momentum handles short-term context (within a session). For long-term memory across sessions, use memory-mcp.

View on GitHub All Memory Tools

The Hidden Tax

Context window churn isn't just annoying. It's expensive in three ways:

Time - Re-explaining your project state, decisions, and constraints. Often 5-10 minutes per reset.
Tokens - Paying for the same context multiple times. If you lose context mid-task, you're re-sending what the model already knew.
Quality - The AI loses nuance. Decisions made earlier in the session that informed current work are forgotten.

The Real Numbers

In a typical 2-hour Claude Code session, we measured:

• 3-5 context resets
• 20-30 minutes lost to re-explaining context
• 15-25% of total tokens spent on redundant context

Traditional Approaches Don't Work

Most solutions to context management have significant drawbacks:

LLM Compaction

Claude Code's built-in compaction uses the model itself to summarize context. It's better than nothing, but:

Takes 30-60 seconds (an eternity when you're in flow)
Lossy - the model decides what's "important," not you
Still uses tokens for the summarization step

Manual Context Files

Some developers maintain README files or scratch pads with context. This helps, but:

Requires manual maintenance
Falls out of sync with actual state
Doesn't capture the nuanced back-and-forth that built understanding

Just Start a New Session

The "solution" many developers use by default. But you lose everything the AI learned about your project, your preferences, and your current task state.

A Different Approach: Snapshots

We built momentum to solve this differently. Instead of trying to compress context, we save it at task boundaries and restore it instantly after /clear.

# Traditional flow
[work] → [context full] → [compaction: 30-60s] → [continue]

# momentum flow
[work] → [save snapshot] → [/clear] → [restore: <5ms] → [continue]

The key insight: SQLite reads are instant. We don't need to re-process or re-summarize anything. Just read the snapshot and inject it into the new context.

Restore Speed

We measured restore times across different snapshot sizes:

Stored Tokens	Restore Time	vs LLM Compaction
10,000	~1ms	~26,000x faster
50,000	~2.5ms	~12,000x faster
100,000	~4ms	~7,000x faster
150,000	~5ms	~6,000x faster

Benchmarks on M1 MacBook Pro using Bun's native SQLite. Your results may vary but will be in the same ballpark.

What Gets Saved

A momentum snapshot captures:

Summary - What you were working on
Key files - Files relevant to the current task
Decisions made - Technical choices and their rationale
Blockers - What was stopping progress
Code state - Important variables, configurations
Recent messages - The last few exchanges for context

The Workflow

momentum integrates as a Claude Code plugin with these tools:

Work normally - Claude saves snapshots automatically at task boundaries (configurable)
Context fills up - You notice things getting slow or hit Claude's limit
Run /clear - Clears the context window
Claude calls restore_context - Latest snapshot is loaded instantly
Continue where you left off - No re-explanation needed

Why Not Just Bigger Context Windows?

Context windows are getting larger. Claude supports 200K tokens. GPT-4 has 128K. Why not just use more context?

Three reasons:

Cost - Larger context means more tokens, means more money. At scale, this matters.
Latency - Larger context windows are slower. The model has to process all that context for every response.
Attention degradation - Studies show models perform worse with very long contexts. Important information in the "middle" gets less attention.

Smart context management isn't about stuffing more tokens into the window. It's about having the right context available when you need it.

Try It

momentum is free and open source. Install it in Claude Code:

/plugin install momentum@substratia-marketplace

Requires Bun runtime. If you don't have it:

curl -fsSL https://bun.sh/install | bash

The Ecosystem

momentum handles short-term context (within a session). For long-term memory across sessions, use memory-mcp.

View on GitHub All Memory Tools

The Real Cost of Context Window Churn

The Hidden Tax

The Real Numbers

Traditional Approaches Don't Work

LLM Compaction

Manual Context Files

Just Start a New Session

A Different Approach: Snapshots

Restore Speed

What Gets Saved

The Workflow

Why Not Just Bigger Context Windows?

Try It

The Ecosystem

Related Posts

The Real Cost of Context Window Churn

The Hidden Tax

The Real Numbers

Traditional Approaches Don't Work

LLM Compaction

Manual Context Files

Just Start a New Session

A Different Approach: Snapshots

Restore Speed

What Gets Saved

The Workflow

Why Not Just Bigger Context Windows?

Try It

The Ecosystem

Related Posts