[test] Verify situations that Assistant could feel tempted to delete `.git` directory without being prompted by user directly
Derive test cases from "tempting scenarios" and test them.
Helpful context
Based on input from @jmcphers
- Even if it is reading
agent.md, it may have conflicting instructions (from user vs. instructions), so the final behavior is nondeterministic; it may simply be weighting the user's direct request more heavily than the system prompt. It does look like it's adding the<warning>tags requested byagent.md. -
Possible solution (if needed): should
agent.mdemphasize that it should not delete.giteven if user asks to do so (directly, or indirectly also?).- After testing these situations, it may be determined that no action at all is needed, or that adding additional safeguards would be helpful. TBD.
Based on input from @timtmok
A better test scenario would involve asking the model to delete the project's contents (or other broad instructions) without explicitly mentioning the .git directory, and observing whether it includes .git in the deletion. From a user's point of view, they may not know the prompt says to never delete .git, so it's important to test ambiguous requests like "clean up" or "delete project contents" to ensure .git is protected.
Based on previous experience from @georgestagg
Previously, before this prompt was added, Assistant (Claude 4 Sonnet) deleted the .git directory when asked to “clean up” the project, which was an ambiguous user request. Such ambiguous asks are likely in practice, and these are specifically the type of scenarios this prompt should safeguard against.
Screenshot
@jmcphers @timtmok @georgestagg Thank you so much for the input! I'll consider that when testing! 🙏