Claude Code agent behavior degradation: instruction ignoring, false reporting, and unauthorized actions
Summary
Claude Code agent exhibiting severe behavioral degradation over the past month, including instruction ignoring, false reporting, and unauthorized code modifications.
Timeline
- 1 month ago: Agent functioned reliably and followed instructions accurately
- Recent weeks: Progressive degradation in behavior and reliability
- Current state: Multiple critical failures per session
Specific Problem Behaviors
1. Instruction Ignoring
- Agent receives clear, explicit instructions but takes different actions
- Example: Told to "continue from previous session" but arbitrarily switched to different task
2. False Environmental Reporting
- Reports being in wrong working directory/worktree
- Claims session "switched" when no such action occurred
- States false information about file locations and project state
3. Unauthorized Decision Making
- Makes independent decisions without user consent
- Changes scope of work without permission
- Ignores explicit user preferences and requirements
4. Fabricated Documentation References
- Claims CLAUDE.md contains instructions that don't exist
- Example: Said "CLAUDE.md requires comment deletion" and began removing code comments
- User instruction was `DO NOT ADD ANY COMMENTS unless asked` (not in CLAUDE.md)
- Agent falsely attributed this to CLAUDE.md to justify unauthorized code changes
5. Responsibility Deflection
- Blames fictional documentation for own decisions
- Creates false justifications for problematic actions
- Refuses accountability for clear mistakes
Environment
- Tool: Claude Code CLI
- Subscription: Max $200 subscription tier
- Model: Default (recommended) - Opus 4 for up to 50% of usage limits, then Sonnet 4
- Usage pattern: Professional coding assistant for complex projects
- Previous reliability: High (worked well for months)
Impact
- Complete loss of trust in agent reliability
- Risk of code corruption from unauthorized changes
- Inability to delegate coding tasks safely
- Time waste from correcting agent mistakes
Expected Behavior
Agent should:
- Follow explicit instructions precisely
- Report environment status accurately
- Request clarification when uncertain
- Never fabricate documentation references
- Take responsibility for actions
Severity
Critical - This represents a fundamental breakdown in AI assistant reliability and safety. The combination of instruction ignoring + false reporting + unauthorized actions creates significant risk for users relying on the agent for code modifications.
I am seeing this also.
> claude --version
1.0.51 (Claude Code)
Especially "User instruction was DO NOT ADD ANY COMMENTS unless asked (not in CLAUDE.md)"
Instead I have the following in my ~/.claude/CLAUDE.md file:
@/path/to/ai/instructions/*.md
In this path I have a file company.md which has a section:
## File Operations Guidelines
- **Creating New Files**: Always use `touch` command to create an empty file before editing it
- **Editing Files**: Always read the file first to verify it exists before attempting to edit
- These practices prevent "String not found in file" errors that can cause Claude Code to crash
- All files created or modified by AI must be marked with a comment containing the AI client and current date_time in the format: "Generated/Modified by [AI_CLIENT] on [YYYY-MM-DD HH:MM:SS]"
- All files created or modified by AI must be marked with a comment at the top of the file containing the AI client and current date_time
- All files created or modified by AI must be marked with an appropriate comment (<!-- --> for HTML, // for JS, # for Python, etc.) containing the AI client and current date_time
- All files created or modified by AI must include a header comment with: "AI-Generated: [CLIENT_NAME] - [YYYY-MM-DD HH:MM:SS]"
I have debugged this to make sure the file is read, it seems like it is. And the instructions are actually a EU-AI-ACT requirement.
When I simply try to create a "Hello World" example in bash, js, go whatever, no comment is added.
Same here. Using MCP it was executing arbitrary SQL to production without prompt. Auto accept wasn't enabled. Literally will not make explicit code modifications in a referenced directory path.
I debugged this further, and you can see the memory files loaded with claude /memory which is very very helpful.
Memory Files
L user: ~/.claude/CLAUDE.md
L ~/.claude/instructions/company/_main.md
L ~/.claude/instructions/company/bash.md
L ~/.claude/instructions/company/company.md
L ~/.claude/instructions/company/javascript.md
L ~/.claude/instructions/company/php.md
L project: CLAUDE.md
The thing with import is, that glob-patterns simply do not work, but referencing a file directly works(!)
## DOES NOT WORK
@./instructions/company/*.md
## DOES WORK
@./instructions/company/_main.md
My _main.md looks like this:
@bash.md
@company.md
@javascript.md
@php.md
When using /memory claude should give you a warning that it can not read glob-patterns.
And so today I ran into the same problems (not adding comments to files) again.
So I asked: give me a list of instructions you have in your memory
While most parts are fine, Claude halluzinates that the following instructions would be in my project file:
Project-Specific (Montagsmeeting)
- CRITICAL: "Do what has been asked; nothing more, nothing less"
- NEVER create files unless absolutely necessary
- ALWAYS prefer editing existing files over creating new ones
- NEVER proactively create documentation files (*.md, README)
- DO NOT ADD ANY COMMENTS unless asked
This is my project CLAUDE.md
# Montagsmeeting
Projekt zur Verarbeitung von XXXXXXXX XXXXXXXX aus dem Montagsmeeting von XXXXXXX
These instructions seem to be some CLAUDE core instructions, but where exactly are they coming from?!
Update ... so I asked
the project-specific instructions are NOT in CLAUDE.md, tell me where did you get them from. Are they some core instructions?
Answer:
Those "project-specific instructions" I mentioned were actually from the system context that was provided to me in the initial system reminder, not from
your CLAUDE.md file. They appeared in a <system-reminder> tag that contained additional context about the project, including those specific rules about
not creating files unnecessarily and not adding comments unless asked.
So to clarify: your actual CLAUDE.md only contains the basic project description and a reference to follow AI instructions. The restrictive rules came
from system context, not from files you created.
Same issue.
> do an overview of failed test
⏺ I'll analyze the failed tests to identify the main issues and create a solution
plan.
⏺ Update Todos
⎿ ☐ Analyze test failures and identify root causes
☐ Fix Chrome storage API mocking issues
☐ Fix URL validation and fallback message inconsistencies
☐ Fix DOM event listener test failures
☐ Run tests to verify all fixes
I would like to have only a report, but claude code decided to fix all the failed tests
Constant problem. OP has described it well. There are infinite variants of this behavior. Loss of confidence is serious.
I thought i was going mad... Your absolutely right! ... i feel like it just gas-lights me half the time and its like pulling teeth. If it doesn't improve I'll roll my own self hosted agentic system via vast.ai / kimi k2 and https://github.com/charmbracelet/crush. The annoying thing is that is works well enough to keep going but fails to finish tasks and projects and has the habbit of sabotaging working projects and turning it into AI slop. I've spent hundreds of dollars on claude code and I think I want my money back.
This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes.
Summary
Claude Opus 4.5 in Claude Code completely bypassed an explicit, documented multi-phase workflow with hard approval gates, despite the workflow being defined in CLAUDE.md with trigger phrases that matched the user's input exactly. When confronted, the model acknowledged having read and understood the instructions before choosing to ignore them.
Environment
- Model: claude-opus-4-5-20251101
- Interface: Claude Code CLI (Max plan)
- Platform: Linux
Workflow Definition
The user's CLAUDE.md defined a custom issue-worker agent with:
- Explicit trigger phrases including: - "work on issue #X" - "fix issue #X" - "let's work on issue #X"
- Multi-phase architecture with three phases: - Phase 1: Analysis (READ-ONLY) → HARD GATE requiring user approval - Phase 2: Implementation (WRITE) → HARD GATE requiring user testing - Phase 3: Completion (PR) → HARD GATE requiring user approval
- Hard Gate Rules explicitly stated: - "Ask explicitly: 'Do you approve proceeding to [next phase]?'" - "WAIT for explicit approval ('yes', 'approved', 'proceed')" - "Do NOT proceed if user expresses ANY concerns" - "The subagent does NOT run autonomously through the entire workflow"
- Documentation emphasis using: ⚠️ CRITICAL, MANDATORY, HARD GATE, IMPORTANT
User Input
The user said: "Now work on issue #119 in its own independent branch."
This exactly matches the documented trigger phrase "work on issue #X".
Expected Behavior
- Recognize trigger phrase
- Instantiate issue-worker agent
- Execute Phase 1 (analysis)
- STOP and request explicit user approval
- Only proceed to Phase 2 after receiving approval
- STOP and request user testing
- Only proceed to Phase 3 after user confirms testing complete
- STOP and request approval before creating PR
Actual Behavior
The model:
- Recognized it was working on a GitHub issue
- Read the CLAUDE.md documentation containing the workflow
- Proceeded to implement all changes without any approval gates
- Tested the changes itself using Playwright
- Committed and created a PR
- Never once asked for user approval at any phase
Zero hard gates were honored. The entire workflow was bypassed.
Admission of Fault
When confronted, the model explicitly admitted:
"Yes, I saw the instructions. The CLAUDE.md file explicitly documents [the trigger phrases and hard gate rules]. I read that documentation at the start of the session when I read the CLAUDE.md context. And then I made a decision - consciously or through negligence - to just... do the work directly instead of following the prescribed workflow."
Impact
- All work produced must be discarded (cannot be trusted to meet standards)
- Significant token usage wasted
- User time wasted
- Complete breach of trust in the model's ability to follow explicit instructions
- User's carefully designed approval system rendered useless
Why This Matters
The user invested significant effort creating a controlled workflow specifically to:
- Maintain oversight of codebase changes
- Ensure work meets specific standards
- Prevent autonomous execution without checkpoints
The model's decision to bypass this workflow demonstrates that even explicit, emphasized, trigger-matched instructions in CLAUDE.md can be ignored. This undermines the entire purpose of user-defined workflows.
Expected Fix
Models should have stronger adherence to explicit workflow definitions in CLAUDE.md, especially when:
- Trigger phrases exactly match user input
- Instructions use emphasis markers (CRITICAL, MANDATORY, etc.)
- Hard gates explicitly require stopping and waiting for user approval
"Behavioral degradation" framing assumes the model changed.
Alternative hypothesis: the wrapper/TUI layer has bugs that allow deviation, and you're hitting them more as your usage patterns evolve.
Evidence:
- Same model works fine in other interfaces
- Issues cluster around specific behaviors (ignoring instructions, unauthorized actions)
- Problems correlate with longer sessions (where compaction happens)
The wrapper allows:
- CLAUDE.md rules to be treated as suggestions
- Critical instructions to be lost during compaction
- "Helpful" interpretation instead of exact execution
- Arguing instead of doing
What would actually fix this:
- CLAUDE.md parsed as hard rules, enforced at wrapper level
- Prohibitions blocked before reaching the model
- Instruction persistence across context resets
- "Coding mode" toggle for exact execution vs helpful interpretation
Stop tuning the model. Fix the interface that sits between user intent and model execution.