[Bug]: Claude 3.7 Not following directions.
Is there an existing issue for the same bug?
- [x] I have checked the existing issues.
Describe the bug and reproduction steps
Ever since the change to Claude 3.7, the coding quality has gone way up and coding mistakes have gone way down. I am having an issue with Claude not listening to directions. This isn't an isolated incedent, it's happening consistently. I am wondering if adjusting the LLM temperature would help? What is the suggested way of doing that? Has anyone else noticed this lately? I've asked it not to do something multiple times and it proceeded to do it 5 times and each time it even acknowledged afterwards that it did what it wasn't supposed to. It's very odd. I'm wondering if the system prompts might aren't giving enough emphasis on following directions. When I asked it why it repeated the same mistake over and over, it responded that it is mistakenly following it's training over the user instructions. If anyone has any ideas on how to get Claude back under control, it would be much appreciated.
OpenHands Installation
Docker command in README
OpenHands Version
No response
Operating System
None
Logs, Errors, Screenshots, and Additional Context
No response
I'm not sure about this but I wanted to mention that 0.28 seemed to work better than 0.28.1. Not sure if it's a coincidence or maybe they made some changes to Claude but I've noticed a difference.
I realized that memory condensation got turned on when I switched to using dev mode which once again made OpenHands unusable in real life. When I was looking at the config.template.toml I saw there are a lot of settings. I'm wondering if maybe I don't have it set up correctly. I don't have any settings selected for it.
Just to follow up, undoing the condensation helps but it's still very unruly. It used to take a couple messages before it would respond and do what I ask. Now it will completely ignore instructions even if I ask multiple times. It must have to do with claude 3.7
Has anyone else noticed this lately? I've asked it not to do something multiple times and it proceeded to do it 5 times and each time it even acknowledged afterwards that it did what it wasn't supposed to. It's very odd. I'm wondering if the system prompts might aren't giving enough emphasis on following directions.
Yes, I have seen this in multiple places (with claude 3.7, not with openhands necessarily), it really is very jumpy and goes off doing stuff, and that stuff is not necessarily what the user said.
Personally, I try now to give it the FIRST message as clear as possible, and containing everything important. It may sound obvious, but TBH I haven't always done that - other times I was starting with a small thing, then add another small thing etc., and it was working; with 3.7 and openhands today I think the first option works significantly better.
I'm not sure about this but I wanted to mention that 0.28 seemed to work better than 0.28.1. Not sure if it's a coincidence or maybe they made some changes to Claude but I've noticed a difference.
We made a system prompt update in 0.28.1 I think 🤔 Significant, with a lot of changes, and the same to tools: [agent] system message
The point in part was to adapt better to 3.7. Maybe we haven't enough, or maybe too much?
To add, I also need to watch it, to see when it goes off on a tangent and stop it. I wasn't concerned about that before 3.7, maybe because I felt it wasn't going to go for long I guess.
Now with 3.7, it's like it had way too many coffees 😂
I have found the same thing with 3.7 in general, I have found better luck ( Both in Cursor and Openhands ) adding stuff like "Only implement the what is asked, don't go ahead of what is stated, Don't assume, Follow the spec's Don't deviate from the instructions, Don't reivent the wheel use standards " to my prompts
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Is this a claude thing or is this something we can make better on our side?
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Is it better with Claude 4?
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for over 30 days with no activity.