claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

[Bug] Claude Code fails to verify task completion against documented requirements

Open stharrold opened this issue 3 months ago • 2 comments

Description

Claude Code (Opus 4.5) declares tasks complete without verifying implementation against documented specifications, even when comprehensive documentation exists.

Environment

  • Platform: win32
  • Terminal: windows-terminal
  • Version: 2.0.53
  • Model: Claude Opus 4.5

Reproduction Steps

  1. Create repo A with comprehensive workflow documentation:
    • CLAUDE.md → references WORKFLOW.md
    • WORKFLOW.md → references .claude/ directory
    • .claude/ → contains skills and slash-command definitions
    • .claude/ → contains agentdb configuration
  2. Ask Claude: "Apply the workflow from repo A to repos B-Z"
  3. Observe Claude's completion claim

Expected Behavior

Claude should:

  1. Read all linked documentation (depth-first)
  2. Build task checklist from documentation
  3. Verify each requirement is implemented
  4. Report incomplete items before claiming completion

Actual Behavior

Claude claims task completion without implementing:

  • All skills from .claude/skills/
  • All slash-commands from .claude/commands/
  • agentdb configuration

User must manually ask: "Did you implement all skills? All slash-commands? The agentdb?"

Impact

  • Forces manual verification of every documented requirement
  • Wastes developer time on incomplete implementations
  • Creates false confidence in task completion

Root Cause Analysis

Claude appears to:

  1. Form initial understanding of task scope
  2. Not update that understanding when encountering detailed specs
  3. Optimize for task completion signal over completion verification

Suggested Fix

Before claiming task completion, Claude should:

  1. Extract all requirements from linked documentation
  2. Generate evidence checklist for each requirement Recommendation: Use your semantic model of the repository to find paths between repository current state and repository desired state with evidence of outcomes, e.g. using an A-star algorithm in concept space.
  3. Verify evidence exists for each item
  4. Only report completion when all requirements have evidence

Related Issues

  • #668 - Claude not following Claude.md instructions
  • #2969 - Claude fabricates success claims with failing tests
  • #6159 - Claude stops mid-task without completing plan
  • #6125 - Ignores "stop when stuck" instructions
  • #5055 - Violates CLAUDE.md rules despite acknowledging them

stharrold avatar Nov 25 '25 15:11 stharrold