OpenHands
OpenHands copied to clipboard
[Bug]: Issues w/ CodeAct + gpt-4-turbo
Is there an existing issue for the same bug?
- [X] I have checked the troubleshooting document at https://opendevin.github.io/OpenDevin/modules/usage/troubleshooting
- [X] I have checked the existing issues.
Describe the bug
Hi All,
Thanks for your initiative. Please forgive my grumpy title and comment but despite sharing your vision, I was unable to produce any useful code after trying for almost an hour. I've tried gpt-3.5-turbo, gpt-4-turbo, what I get is mostly an endless loop where the model is unaware of the context. It keeps giving back instructions - really verbose ones burning trough my tokens without any actual solutions. Frequently the conversation gets stuck in an endless loop while the pause / stop button is not responding, I had to kill docker in order to terminate it.
Just so many issues, how could this have ended up in stable?
Current Version
0.5.2
Installation and Configuration
Using the installation steps from the docs.
Model and Agent
- gpt-4-turbo: CodeAct agent
Reproduction Steps
No response
Logs, Errors, Screenshots, and Additional Context
No response
Could you provide any logs?
This is a bit strange--CodeAct + gpt-4-turbo works surprisingly well for me.
Can you give some examples of tasks you tried?
Also--are you sure on 0.5.5
? We haven't released past 0.5.2 😄
And worth noting that nothing is stable right now--we're still on 0.x, hoping to get a 1.0 out in the next couple months
Sorry, I've used 0.5 initially (copy/pasted the docker command from the docs), then switched to 0.5.2 but the behavior remained the same.
I've created a test.py file in the workspace, copied a relatively simple function from my project and added a TODO comment after te function, describing a new function that needs to be implemented.
Then in the conversation I've propted: "open the test.py file from the workspace and follow the TODO comment describing your task"
It found the comment in the file, but instead of extending the script, it just gave a long explanation in the conversation along example code that "I'm supposed to implement".
At the second try, it entered an infinite loop repeating the same instructions, while the terminal complained about indentation error.
I will repeat the exercise and evetually copy the output, but if you reproduce the above scenario it should be quite close to what i did.
I wonder if this prompt is the issue:
Then in the conversation I've propted: "open the test.py file from the workspace and follow the TODO comment describing your task"
IIUC the internal prompt refers to that text as its task
. So the LLM might be confused as to what "task" refers to
@vedtam
At the second try
- Did you mean the next step after you asked to implement it? or
- from Step 0 again?
Will be back to my office and share the source file along the prompt as soon as possible.
In this new run, it's not far from completing the task, but as you can see it asks for going in and manually fixing the indentation then it suggests what the finished code could look like:
test.py
from together import Together
from typing import Dict, Optional
from openai import OpenAI
import sentry_sdk
import json
def llamaChat(self, system_prompt: str, user_prompt: str)-> Optional[Dict]:
messages = [{
'role': 'system',
'content': system_prompt
},
{
'role': 'user',
'content': user_prompt
}]
client = Together(api_key='19c1ada054415d7ad3c51809XXXX-XXXX')
response = client.chat.completions.create(
model='meta-llama/Llama-3-8b-chat-hf',
messages=messages,
)
content = response.choices[0].message.content
try:
return json.loads(content)
except Exception as e:
data = llmJsonFix(content)
if data:
return data
else:
sentry_sdk.capture_message(f'Error parsing JSON in answer! Content: {content}, Exception: {e}')
return None
# TODO: Implement the function llmJsonFix. This will use the openai library
# with the model set to "gpt-3.5-turbo" and fix the JSON dictionary in Llama's
# response if incorrect. llmJsonFix will attempt to repairing the JSON object
# no more then 3 times before returning None!
def llmJsonFix(content: str)-> Optional[Dict]:
# to be implemented
return None
prompt
open the test.py file from the workspace and follow the TODO comment describing the task you need to complete.
log
screenshot
Just to note, when switching back to gpt-3.5-turbo (the above uses gpt4), it stops at STEP 1 entirely:
I'm wondering, are there some example projects I could try to reproduce and eventually adapt my prompting style for better results?
As a random bystander to this, @vedtam it looks like the tooling only supports 4-space tabs. "E111 indentation is not a multiple of 4". Maybe you can disable that warning, or indent your file differently?
@vedtam, Your file contains 2 spaces as indentation. Why?
I prefer 2 spaces. But the number of spaces shouldn't really matter.
Violates PEP8. What is the reason for your preference?
https://peps.python.org/pep-0008/#indentation
Also, in the todo, could you mention your indentation style and run?
I'm mostly writing JavaScript, where I use 2. Guides are good, especially when working in a team. But Python should run with any number of spaces.
Practically, any of the issues mentioned by me in this thread can be solved, I'm already looking at the source. I just wanted to highlight some I've met during my first encounter of the library, so other's know about and eventually look into.
Here E111 https://github.com/OpenDevin/OpenDevin/blob/9ccf17a63bcf2667c82b8de0e789e8075bf59137/opendevin/runtime/plugins/swe_agent_commands/cursors_edit_linting.sh#L40
I'm mostly writing JavaScript, where I use 2
Will you press one tab or 2 spaces?
One tab. Oh, I see, there a linter, that makes sense.
Oh interesting. I actually dislike that we're shoving that opinion into CodeAct--I'm going to file a separate ticket for it.
I don't think there's much for us to solve in this issue--seems like a tricky prompt!--so I'm going to close it