claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

[MODEL] Claude Code (Opus 4.5) guesses instead of verifying, gives confident wrong answers

Open tvdeyen opened this issue 1 month ago • 3 comments

Preflight Checklist

  • [x] I have searched existing issues for similar behavior reports
  • [x] This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Other unexpected behavior

What You Asked Claude to Do

During a debugging session for a GitHub Actions workflow failure, Claude Code repeatedly gave confident but incorrect answers instead of first researching the actual documentation and source code.

What Claude Actually Did

  1. Guessed at RubyGems trusted publisher configuration - Suggested multiple different configurations (changing repo name, workflow filename path) without first checking how trusted publishing actually works with reusable workflows.
  2. Didn't know reusable workflows don't work with RubyGems trusted publishing - This is documented in an open GitHub issue (rubygems/rubygems.org#4294) from 2023. Claude should have found this immediately instead of having me try multiple configurations.
  3. Suggested disabling MFA for gem pushes - A serious security anti-pattern. When I pushed back, Claude then claimed you could toggle MFA on/off per API key, which was also wrong for my account's MFA level.
  4. Said manual release creation would trigger a workflow_run workflow - Completely incorrect. The post-release workflow triggers on workflow_run from a specific workflow completing, not on release creation events.
  5. Missed the attestations: true default - After switching to API key auth, the workflow failed because attestations only work with trusted publishing. Claude should have caught this when reading the action source code.
  6. Broke the version grep - The grep pattern matched multiple lines containing "VERSION" instead of just the version assignment.

Root cause

Claude repeatedly made guesses and stated them confidently instead of:

  • Reading source code first (rubygems/release-gem, rubygems/configure-rubygems-credentials)
  • Searching for known issues
  • Verifying claims before stating them

When I asked "are you guessing now?" after multiple failed attempts, Claude admitted it was.

Impact

  • Multiple hours wasted
  • Multiple failed workflow runs
  • Had to downgrade RubyGems MFA settings from "UI and API" to "UI and gem signin"
  • Release left incomplete (gem published, but GitHub Release and post-release workflow didn't run)

Expected Behavior

Claude should research before answering, especially for integration/configuration issues. If uncertain, say so upfront rather than giving confident wrong answers. The most expensive model should not require users to repeatedly ask "show me proof" to get accurate information.

Files Affected


Permission Mode

I don't know / Not sure

Can You Reproduce This?

Haven't tried to reproduce

Steps to Reproduce

No response

Claude Model

Opus

Relevant Conversation


Impact

Medium - Extra work to undo changes

Claude Code Version

2.0.64 (Claude Code)

Platform

Anthropic API

Additional Context

No response

tvdeyen avatar Dec 10 '25 18:12 tvdeyen

Found 3 possible duplicate issues:

  1. https://github.com/anthropics/claude-code/issues/11913
  2. https://github.com/anthropics/claude-code/issues/8782
  3. https://github.com/anthropics/claude-code/issues/4011

This issue will be automatically closed as a duplicate in 3 days.

  • If your issue is a duplicate, please close it and 👍 the existing issue instead
  • To prevent auto-closure, add a comment or 👎 this comment

🤖 Generated with Claude Code

github-actions[bot] avatar Dec 10 '25 18:12 github-actions[bot]

This is kind of normal for LLMs... there could be numerous causes; it's not likely to be caused by a software bug specifically. They had a big degredation problem that they published a deep infrastructure report on. That issue they describe in there is the only way. Also, you don't know if it's even true when it says it's making things up, that's just what it says. It's probably true, it's just not necessarily true.

CookieSalesman avatar Dec 11 '25 07:12 CookieSalesman