Is there an existing issue for the same bug?

[X] I have checked the troubleshooting document at https://opendevin.github.io/OpenDevin/modules/usage/troubleshooting
[X] I have checked the existing issues.

Describe the bug

Hi All,

Thanks for your initiative. Please forgive my grumpy title and comment but despite sharing your vision, I was unable to produce any useful code after trying for almost an hour. I've tried gpt-3.5-turbo, gpt-4-turbo, what I get is mostly an endless loop where the model is unaware of the context. It keeps giving back instructions - really verbose ones burning trough my tokens without any actual solutions. Frequently the conversation gets stuck in an endless loop while the pause / stop button is not responding, I had to kill docker in order to terminate it.

Just so many issues, how could this have ended up in stable?

Current Version

0.5.2

Installation and Configuration

Using the installation steps from the docs.

Model and Agent

gpt-4-turbo: CodeAct agent

Reproduction Steps

No response

Logs, Errors, Screenshots, and Additional Context

No response

May 11 '24 08:05 vedtam

Could you provide any logs?

May 11 '24 12:05 SmartManoj

This is a bit strange--CodeAct + gpt-4-turbo works surprisingly well for me.

Can you give some examples of tasks you tried?

May 11 '24 13:05 rbren

Also--are you sure on 0.5.5? We haven't released past 0.5.2 😄

And worth noting that nothing is stable right now--we're still on 0.x, hoping to get a 1.0 out in the next couple months

May 11 '24 13:05 rbren

Sorry, I've used 0.5 initially (copy/pasted the docker command from the docs), then switched to 0.5.2 but the behavior remained the same.

I've created a test.py file in the workspace, copied a relatively simple function from my project and added a TODO comment after te function, describing a new function that needs to be implemented.

Then in the conversation I've propted: "open the test.py file from the workspace and follow the TODO comment describing your task"

It found the comment in the file, but instead of extending the script, it just gave a long explanation in the conversation along example code that "I'm supposed to implement".

At the second try, it entered an infinite loop repeating the same instructions, while the terminal complained about indentation error.

I will repeat the exercise and evetually copy the output, but if you reproduce the above scenario it should be quite close to what i did.

May 11 '24 14:05 vedtam

I wonder if this prompt is the issue:

Then in the conversation I've propted: "open the test.py file from the workspace and follow the TODO comment describing your task"

IIUC the internal prompt refers to that text as its task. So the LLM might be confused as to what "task" refers to

May 11 '24 15:05 rbren

@vedtam

At the second try

Did you mean the next step after you asked to implement it? or
from Step 0 again?

May 11 '24 15:05 SmartManoj

Will be back to my office and share the source file along the prompt as soon as possible.

May 11 '24 15:05 vedtam

In this new run, it's not far from completing the task, but as you can see it asks for going in and manually fixing the indentation then it suggests what the finished code could look like:

test.py

from together import Together
from typing import Dict, Optional
from openai import OpenAI
import sentry_sdk
import json

def llamaChat(self, system_prompt: str, user_prompt: str)-> Optional[Dict]:
  messages = [{
    'role': 'system',
    'content': system_prompt
    },
    {
    'role': 'user',
    'content': user_prompt
  }]
      
  client = Together(api_key='19c1ada054415d7ad3c51809XXXX-XXXX')
  response = client.chat.completions.create(
      model='meta-llama/Llama-3-8b-chat-hf',
      messages=messages,
  )
  content = response.choices[0].message.content

  try:
    return json.loads(content)
  except Exception as e:
    data = llmJsonFix(content)

    if data:
      return data
    else:
      sentry_sdk.capture_message(f'Error parsing JSON in answer! Content: {content}, Exception: {e}')
      return None
    
# TODO: Implement the function llmJsonFix. This will use the openai library
# with the model set to "gpt-3.5-turbo" and fix the JSON dictionary in Llama's
# response if incorrect. llmJsonFix will attempt to repairing the JSON object
# no more then 3 times before returning None!
def llmJsonFix(content: str)-> Optional[Dict]:
  # to be implemented
  return None

prompt

open the test.py file from the workspace and follow the TODO comment describing the task you need to complete.

log

log.txt

screenshot

Screenshot 2024-05-11 at 20 59 46

May 11 '24 18:05 vedtam

Just to note, when switching back to gpt-3.5-turbo (the above uses gpt4), it stops at STEP 1 entirely:

Screenshot 2024-05-11 at 21 09 46

I'm wondering, are there some example projects I could try to reproduce and eventually adapt my prompting style for better results?

May 11 '24 18:05 vedtam

As a random bystander to this, @vedtam it looks like the tooling only supports 4-space tabs. "E111 indentation is not a multiple of 4". Maybe you can disable that warning, or indent your file differently?

May 12 '24 04:05 reteps

@vedtam, Your file contains 2 spaces as indentation. Why?

May 12 '24 05:05 SmartManoj

I prefer 2 spaces. But the number of spaces shouldn't really matter.

May 12 '24 05:05 vedtam

Violates PEP8. What is the reason for your preference?

https://peps.python.org/pep-0008/#indentation

Also, in the todo, could you mention your indentation style and run?

May 12 '24 05:05 SmartManoj

I'm mostly writing JavaScript, where I use 2. Guides are good, especially when working in a team. But Python should run with any number of spaces.

Practically, any of the issues mentioned by me in this thread can be solved, I'm already looking at the source. I just wanted to highlight some I've met during my first encounter of the library, so other's know about and eventually look into.

May 12 '24 05:05 vedtam

Here E111 https://github.com/OpenDevin/OpenDevin/blob/9ccf17a63bcf2667c82b8de0e789e8075bf59137/opendevin/runtime/plugins/swe_agent_commands/cursors_edit_linting.sh#L40

May 12 '24 06:05 SmartManoj

I'm mostly writing JavaScript, where I use 2

Will you press one tab or 2 spaces?

May 12 '24 06:05 SmartManoj

One tab. Oh, I see, there a linter, that makes sense.

May 12 '24 06:05 vedtam

Oh interesting. I actually dislike that we're shoving that opinion into CodeAct--I'm going to file a separate ticket for it.

I don't think there's much for us to solve in this issue--seems like a tricky prompt!--so I'm going to close it

May 14 '24 18:05 rbren

OpenHands
OpenHands copied to clipboard

[Bug]: Issues w/ CodeAct + gpt-4-turbo

Is there an existing issue for the same bug?

Describe the bug

Current Version

Installation and Configuration

Model and Agent

Reproduction Steps

Logs, Errors, Screenshots, and Additional Context

test.py

prompt

log

screenshot

OpenHands OpenHands copied to clipboard

[Bug]: Issues w/ CodeAct + gpt-4-turbo

Is there an existing issue for the same bug?

Describe the bug

Current Version

Installation and Configuration

Model and Agent

Reproduction Steps

Logs, Errors, Screenshots, and Additional Context

test.py

prompt

log

screenshot

OpenHands
OpenHands copied to clipboard