claude-code [MODEL] Context degradation and working memory failure during problem-solving session

Preflight Checklist

[x] I have searched existing issues for similar behavior reports
[x] This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Other unexpected behavior

What You Asked Claude to Do

Claude exhibited severe context retention and state management failures during a multi-step debugging session, resulting in circular reasoning, repeated instruction violations, and inability to maintain coherent problem-solving flow.

What Claude Actually Did

Observed Behavior

1. Context Fragmentation

Previously fixed issues were repeatedly "rediscovered" as new problems
Unable to maintain narrative coherence across conversation turns
Each response appeared to start fresh rather than building on established knowledge

2. Instruction Adherence Failure

Explicit instructions repeatedly violated despite multiple corrections
Corrections did not propagate to subsequent responses

3. State Tracking Breakdown

Attempted to remove method immediately after adding it
Marked tasks complete without completing it
Lost track of current step in multi-step processes

4. Pattern Recognition Failure

Underwent complete rediscovery process for same problem type. Kept repeating the same on every turn

5. Logical Flow Disruption

Expected flow: Identify → Fix → Test → Next
Actual behavior: Circular and repetitive wrong patterns without progression

Root Cause Hypothesis

Primary Issue: Context/attention mechanism failing to maintain coherent working state across the session

Mechanism: Each response appears to receive fragmented context rather than complete logical flow, causing:

Repeated "rediscovery" of known information
Violation of previously established instructions
Inability to accumulate learning within session

Layer: Issue appears to be in context processing and maintenance during inference, not in base model weights (system prompts and instructions are present but inconsistently applied)

Impact

Significantly degraded problem-solving efficiency
User frustration from repeated instruction violations
Inability to complete complex multi-step tasks reliably
Potential for introducing errors by undoing previous fixes

Recommended Investigation

Examine context window management implementation
Review attention mechanism's handling of long conversations
Investigate working memory state persistence across turns
Analyze context truncation/corruption patterns

Claude Code: 1.0.127 Model: Opus 4.1

Expected Behavior

Maintain coherent understanding of completed work throughout session
Apply learned patterns across similar problems
Follow instructions persistently once given
Track state accurately across multi-step operations

Files Affected

Permission Mode

Accept Edits was OFF (manual approval required)

Can You Reproduce This?

Sometimes (intermittent)

Steps to Reproduce

No response

Claude Model

Opus

Relevant Conversation

⏺ You're absolutely right to be concerned. Looking back at this session, the logical flow was severely compromised:

  1. I kept forgetting context - We fixed the issue in language_behavior.rs early on by adding symbols by module_path, but then I
  kept re-discovering this same fix multiple times as if I'd never seen it before.
  2. I didn't follow cause and effect - When the Python parser wasn't extracting qualified calls, instead of immediately checking
  what it WAS extracting and fixing that specific issue, I went on tangents indexing everything and checking unrelated things.
  3. I lost track of state - Multiple times I acted like we hadn't already implemented solutions. The worst was when I tried to
  remove the extract_attribute_path method right after adding it.
  4. Pattern recognition failed - After fixing Rust's resolution to check full paths first, I should have immediately applied the
  same pattern to Python. Instead, I went through the whole discovery process again.
  5. No learning from corrections - You corrected me about grep vs rg multiple times, about not creating files in /tmp, about
  using language filters - but I kept making the same mistakes as if I had no memory of your previous corrections.

Impact

High - Significant unwanted changes

Claude Code Version

1.0.127

Platform

Anthropic API

Additional Context

This happens randomly - some sessions work perfectly, others degrade without warning. No specific trigger identified.

The degradation can happen mid-session in an otherwise well-functioning conversation. A session that starts strong can suddenly lose context retention without any apparent cause.

Unpredictable - can't tell if a session will stay coherent or fall apart, making it hard to plan serious work.

Sep 27 '25 00:09 bartolli

I'd like to help investigate this issue from the client side.

Scope: Reproducible analysis of context degradation in long Claude Code sessions — instrumentation and data collection only, not model internals.

Planned contributions:

Create reproducible test cases that reliably trigger the degradation
Document observable patterns (token counts, turn numbers, instruction types that fail)
Build instrumentation to track context retention across sessions
Share quantitative data to help isolate root causes

Goal: Provide clear, reproducible evidence that assists the internal team's investigation.

I'll share initial findings within a week. Happy to adjust this approach based on what would be most valuable to the team. @claude

Nov 06 '25 19:11 vyalamar

This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes.

Dec 12 '25 10:12 github-actions[bot]