OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

[Bug]: Claude 3.7 Not following directions.

Open amirshawn opened this issue 9 months ago • 8 comments

Is there an existing issue for the same bug?

  • [x] I have checked the existing issues.

Describe the bug and reproduction steps

Ever since the change to Claude 3.7, the coding quality has gone way up and coding mistakes have gone way down. I am having an issue with Claude not listening to directions. This isn't an isolated incedent, it's happening consistently. I am wondering if adjusting the LLM temperature would help? What is the suggested way of doing that? Has anyone else noticed this lately? I've asked it not to do something multiple times and it proceeded to do it 5 times and each time it even acknowledged afterwards that it did what it wasn't supposed to. It's very odd. I'm wondering if the system prompts might aren't giving enough emphasis on following directions. When I asked it why it repeated the same mistake over and over, it responded that it is mistakenly following it's training over the user instructions. If anyone has any ideas on how to get Claude back under control, it would be much appreciated.

OpenHands Installation

Docker command in README

OpenHands Version

No response

Operating System

None

Logs, Errors, Screenshots, and Additional Context

No response

amirshawn avatar Mar 15 '25 04:03 amirshawn

I'm not sure about this but I wanted to mention that 0.28 seemed to work better than 0.28.1. Not sure if it's a coincidence or maybe they made some changes to Claude but I've noticed a difference.

amirshawn avatar Mar 15 '25 04:03 amirshawn

I realized that memory condensation got turned on when I switched to using dev mode which once again made OpenHands unusable in real life. When I was looking at the config.template.toml I saw there are a lot of settings. I'm wondering if maybe I don't have it set up correctly. I don't have any settings selected for it.

amirshawn avatar Mar 15 '25 08:03 amirshawn

Just to follow up, undoing the condensation helps but it's still very unruly. It used to take a couple messages before it would respond and do what I ask. Now it will completely ignore instructions even if I ask multiple times. It must have to do with claude 3.7

amirshawn avatar Mar 15 '25 10:03 amirshawn

Has anyone else noticed this lately? I've asked it not to do something multiple times and it proceeded to do it 5 times and each time it even acknowledged afterwards that it did what it wasn't supposed to. It's very odd. I'm wondering if the system prompts might aren't giving enough emphasis on following directions.

Yes, I have seen this in multiple places (with claude 3.7, not with openhands necessarily), it really is very jumpy and goes off doing stuff, and that stuff is not necessarily what the user said.

Personally, I try now to give it the FIRST message as clear as possible, and containing everything important. It may sound obvious, but TBH I haven't always done that - other times I was starting with a small thing, then add another small thing etc., and it was working; with 3.7 and openhands today I think the first option works significantly better.

I'm not sure about this but I wanted to mention that 0.28 seemed to work better than 0.28.1. Not sure if it's a coincidence or maybe they made some changes to Claude but I've noticed a difference.

We made a system prompt update in 0.28.1 I think 🤔 Significant, with a lot of changes, and the same to tools: [agent] system message

The point in part was to adapt better to 3.7. Maybe we haven't enough, or maybe too much?

enyst avatar Mar 16 '25 07:03 enyst

To add, I also need to watch it, to see when it goes off on a tangent and stop it. I wasn't concerned about that before 3.7, maybe because I felt it wasn't going to go for long I guess.

Now with 3.7, it's like it had way too many coffees 😂

enyst avatar Mar 16 '25 07:03 enyst

I have found the same thing with 3.7 in general, I have found better luck ( Both in Cursor and Openhands ) adding stuff like "Only implement the what is asked, don't go ahead of what is stated, Don't assume, Follow the spec's Don't deviate from the instructions, Don't reivent the wheel use standards " to my prompts

tholum avatar Mar 17 '25 01:03 tholum

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Apr 16 '25 02:04 github-actions[bot]

Is this a claude thing or is this something we can make better on our side?

mamoodi avatar Apr 17 '25 17:04 mamoodi

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jun 06 '25 02:06 github-actions[bot]

Is it better with Claude 4?

mamoodi avatar Jun 06 '25 12:06 mamoodi

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jul 07 '25 02:07 github-actions[bot]

This issue was closed because it has been stalled for over 30 days with no activity.

github-actions[bot] avatar Jul 15 '25 02:07 github-actions[bot]