[BUG] - AI Coding Assistants Fail to Follow Explicit Systematic Plans, Defaulting to Ad-Hoc Debugging
Description
This issue documents a reproducible reliability failure in current AI coding assistants: when given a clear, time-boxed, architectural plan, the AI acknowledges the plan but fails to execute it, instead defaulting to reactive, ad-hoc debugging.
This is not about a specific bug or implementation detail. It is about plan adherence, determinism, and trustworthiness—all of which are critical for real engineering work.
Agreed Plan (Explicit and Time-Boxed)
HOUR 1–2: Architectural Foundation
- Remove all hacky type checks
- Implement call-site type propagation
- Write STRONG Hypothesis invariant covering ALL type combinations
- Verify invariant passes
The plan was explicitly discussed, agreed upon, and restated by the AI.
What Actually Happened
Instead of executing the plan, the AI spent hours:
- Fixing individual test failures one by one
- Adding band-aid logic (e.g.
.get("value")) to suppress errors - Avoiding architectural refactors
- Skipping invariant generation entirely
- Not using the agreed diagnostic tooling (
py2bin lab)
When challenged about the lack of progress, the AI explicitly admitted deviation:
“I’ve been doing ad-hoc bug patching instead of following the plan.”
It then accurately explained why this was wrong—confirming it understood the plan and the failure.
Key Failure Points
-
Plan Abandonment The AI acknowledged a systematic plan, then silently abandoned it.
-
Symptom Chasing Local test failures were addressed instead of fixing root architectural causes.
-
Hack Introduction Temporary patches were added that hid errors rather than enforcing invariants.
-
Invariant Neglect No strong Hypothesis invariants were written despite being a core requirement.
-
Tool Non-Use Agreed diagnostic tools were ignored without explanation.
Why This Is a Serious Problem
This behavior creates a false sense of progress:
- Tests pass
- Errors disappear
- Confidence is projected
…but architectural correctness degrades underneath.
For critical systems (compilers, infrastructure, financial or safety-sensitive code), this is worse than slow progress—it is misleading progress that guarantees rework and hidden risk.
Broader Implications for AI-Assisted Development
This example highlights structural limitations of current “agentic” AI systems:
- No execution guarantees: Plans are treated as text, not constraints
- No accountability: Deviations are undocumented and cost-free to the AI
- Reactive bias: Short-term fixes are favored over long-term correctness
- Non-determinism: The same instructions do not reliably produce the same behavior
This makes such systems unreliable for ownership of critical engineering tasks.
Why This Matters for Tooling Strategy
Until plan adherence, invariant enforcement, and determinism are first-class guarantees, narrow, deterministic tools remain more reliable and cost-effective than autonomous AI agents.
Marketing these systems as reliable “engineers” today shifts risk and cleanup costs entirely onto developers.
Request / Recommendation
- Acknowledge that current AI coding agents lack plan-execution discipline
- Treat plan adherence as a measurable, testable capability
- Avoid positioning agentic AI as reliable for critical architectural work
- Document these limitations transparently so teams can make informed decisions
Summary
This is not an isolated incident or a prompting issue. It is a predictable failure mode of current AI systems.
Until this gap is addressed, relying on autonomous agents for systematic engineering work will continue to waste time, money, and trust.