OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

Default to a less expensive model

Open enyst opened this issue 1 year ago • 6 comments

In relation to https://github.com/OpenDevin/OpenDevin/issues/449

This PR proposes a simple measure for unexpected costs: default to GPT-3.5. Everyone who wishes GPT-4 can choose it in configuration.

Note that it changes also files in /evaluation, where GPT-4 might make more sense, but user configuration should apply there too.

enyst avatar Mar 31 '24 23:03 enyst

At least, I think it's (much) less expensive, can't find it atm on openai website. 😅

enyst avatar Mar 31 '24 23:03 enyst

My only hesitation is that gpt-4 is really noticeably better at generating code. It will give a better out-of-the-box experience with OpenDevin.

But I also get the urge to keep costs down...I'm a bit torn on this one

rbren avatar Apr 01 '24 02:04 rbren

Quality and cost is a tradeoff where the user can decide for themselves.

magedhelmy1 avatar Apr 01 '24 07:04 magedhelmy1

My only hesitation is that gpt-4 is really noticeably better at generating code. It will give a better out-of-the-box experience with OpenDevin.

And at following instructions! It's extremely useful precisely for opendevin, for the kind of precision that we need here, to not just ignore some of what the prompt said. I used opendevin with GPT-3.5 for a couple weeks (is it already? yeah), before switching, and the difference is undeniable. Even when it can't solve the task. Oh and GPT-3.5 gets stuck in a loop more.

enyst avatar Apr 01 '24 18:04 enyst

We are having a vote in slack: https://opendevin.slack.com/archives/C06P5NCGSFP/p1712156159624529

Please pitch in there and we will choose the one that gets the most votes!

neubig avatar Apr 03 '24 14:04 neubig

Seems like the vote favors 3.5! Might need another pass through this to catch all the gpt-4 references

rbren avatar Apr 05 '24 14:04 rbren

Recently, the "Command R" open source LLM based on the "RAG" application has been released, which shows that it is reasonable for the application and the model to be two different fields. If the application relies too much on a certain LLM, I think it is unreasonable. . The implementation of AIGC must be a set of standards, not a LLM model.

zhonggegege avatar Apr 07 '24 04:04 zhonggegege