OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

Refactored prompt.py to reduce token usage

Open temotskipa opened this issue 1 year ago • 7 comments

Refactored prompt.py to reduce token usage. Also included a band-aid fix to resolve an issue Devin encountered while editing prompt.py. The issue in question was that random example commands included in prompt.py were getting executed in the terminal, which resulted in the agent getting stuck in a loop trying to edit the file unsuccessfully. This issue should be resolved in a timely manner with a proper fix imo.

temotskipa avatar May 23 '24 11:05 temotskipa

I encountered it on the opendevin:main docker image.

temotskipa avatar May 23 '24 11:05 temotskipa

I agree with @yufansong , this seems like a great change if it doesn't reduce our accuracy, but I am worried that it might. Maybe we could run swe bench lite with this

neubig avatar May 23 '24 12:05 neubig

Yep! I agree with @yufansong @neubig @rbren -- These typo changes are actually good. How about we wait until https://github.com/OpenDevin/OpenDevin/pull/1941 is merged (since it also changes a bunch of prompt and bump the version to v1.5). Then we merge all the typo fixes here -- so we can just do ONE pass eval of SWE-Bench lite to make sure we are not degrading performance?

xingyaoww avatar May 23 '24 13:05 xingyaoww

Is there anything else that I can do, or do I just wait for #1941 to be merged?

temotskipa avatar May 23 '24 18:05 temotskipa

@temotskipa I think that PR is merged :) feel free to re-adjust the prompt!

xingyaoww avatar May 23 '24 23:05 xingyaoww

I mean I personally think this prompt is fine, but IDK if the maintainers also think so.

temotskipa avatar May 24 '24 06:05 temotskipa

@temotskipa Can you resolve those merge conflict so we can review again and try to merge it?

xingyaoww avatar May 24 '24 09:05 xingyaoww

@temotskipa are you interested in pushing this one forward? @xingyaoww do you have specific changes you'd like to see?

rbren avatar Jun 03 '24 13:06 rbren

@temotskipa are you interested in pushing this one forward? @xingyaoww do you have specific changes you'd like to see?

I suggest to hold any prompts change before we finishing all benchmark evaluation.

yufansong avatar Jun 03 '24 13:06 yufansong

I am mainly looking for changes that fix the conflicts - once those are fixed, I'm happy with merging this PR (after 2 days - when we finish running all the eval for the paper as yufan suggested)

xingyaoww avatar Jun 03 '24 14:06 xingyaoww

So am I supposed to revert the changes and just fix any typos and things like that? It's unclear to me what is conflicting here.

temotskipa avatar Jun 03 '24 18:06 temotskipa

@temotskipa Sorry for getting back late! We were running for a huge deadline :( Feel free to keep whatever you want to change! As long as the conflict is gone, we can review and merge it :)

xingyaoww avatar Jun 06 '24 16:06 xingyaoww

@xingyaoww we could probably use legacy BrowseURLAction for this. Just an URL in this block?

frankxu2004 avatar Jun 07 '24 11:06 frankxu2004

with browsing agent delegation, now it seems we need to do this to simply browse a page - we start a whole new agent just to execute a goto action, anyway we could simplify this?

We could tweak the prompt more and let CodeActAgent decide whether to delegate. Now I just delegate everything related to browsing activities. On the other hand, that would increase the complexity of prompts inside CodeActAgent...

PR: https://github.com/OpenDevin/OpenDevin/pull/2326

li-boxuan avatar Jun 07 '24 16:06 li-boxuan

@li-boxuan sounds good! Shall we add the legacy action back? Or I'm actually thinking of integrating those browser actions to agentskills once we can put browser into the sandbox - and keep the delegation as is for now (per discussion with frank: https://opendevin.slack.com/archives/C06TZ1VCCAD/p1717748986732919?thread_ts=1717748260.194329&cid=C06TZ1VCCAD)

xingyaoww avatar Jun 07 '24 16:06 xingyaoww

Just on a high-level, I think doing the simple thing for now to get this PR finished would be good! But it seems that tests are failing, @xingyaoww if you could take a look that'd be good!

neubig avatar Jun 08 '24 11:06 neubig

Integration test should be fixed as well -- @rbren @enyst feel free to take a look and unblock the PR if possible :)

xingyaoww avatar Jun 09 '24 06:06 xingyaoww