OpenHands Default to a less expensive model

In relation to https://github.com/OpenDevin/OpenDevin/issues/449

This PR proposes a simple measure for unexpected costs: default to GPT-3.5. Everyone who wishes GPT-4 can choose it in configuration.

Note that it changes also files in /evaluation, where GPT-4 might make more sense, but user configuration should apply there too.

Mar 31 '24 23:03 enyst

At least, I think it's (much) less expensive, can't find it atm on openai website. 😅

Mar 31 '24 23:03 enyst

My only hesitation is that gpt-4 is really noticeably better at generating code. It will give a better out-of-the-box experience with OpenDevin.

But I also get the urge to keep costs down...I'm a bit torn on this one

Apr 01 '24 02:04 rbren

Quality and cost is a tradeoff where the user can decide for themselves.

Apr 01 '24 07:04 magedhelmy1

My only hesitation is that gpt-4 is really noticeably better at generating code. It will give a better out-of-the-box experience with OpenDevin.

And at following instructions! It's extremely useful precisely for opendevin, for the kind of precision that we need here, to not just ignore some of what the prompt said. I used opendevin with GPT-3.5 for a couple weeks (is it already? yeah), before switching, and the difference is undeniable. Even when it can't solve the task. Oh and GPT-3.5 gets stuck in a loop more.

Apr 01 '24 18:04 enyst

We are having a vote in slack: https://opendevin.slack.com/archives/C06P5NCGSFP/p1712156159624529

Please pitch in there and we will choose the one that gets the most votes!

Apr 03 '24 14:04 neubig

Seems like the vote favors 3.5! Might need another pass through this to catch all the gpt-4 references

Apr 05 '24 14:04 rbren

Recently, the "Command R" open source LLM based on the "RAG" application has been released, which shows that it is reasonable for the application and the model to be two different fields. If the application relies too much on a certain LLM, I think it is unreasonable. . The implementation of AIGC must be a set of standards, not a LLM model.

Apr 07 '24 04:04 zhonggegege