crewAI icon indicating copy to clipboard operation
crewAI copied to clipboard

Not finding a useful Ollama model for stock example

Open abdinal1 opened this issue 1 year ago • 6 comments

Im running the stock example with open source models on Ollama. I have tried various open models (did not write my own modelfile) and it always ended up either delivering bad outputs, generating syntax errors in python or no meaningfull results at all.

Some models I have tried (directly from Ollama hub, I have tried even more, varying parametersize and precision):

  • Openhermes 2.5
  • Llama 2
  • Mixtral
  • nous-hermes2
  • even ChatGPT 3.5 (delivered a useless report at the end)
  • GPT 4 when exectuing yahoofinancenews it came up with syntax error in python keyerror 'description' any ideas?

Did anyone have success with open source models yet for the stock recommendation task? If yes what model is it, did you use a modelfile or directly pulled from ollama?

Was the Youtube example run on GPT-4? Because the results seemed quite impressive and I cant imagine a open source model (especially smaller ones) to achieve equally pleasant results.

Thanks in advance.

abdinal1 avatar Jan 05 '24 11:01 abdinal1

@abdinal1 I can confirm from my tests, most light local models won't be able to run those correctly until the end. The best results I had (only tested 7B models though) were with starling-lm and mistral:instruct (at least they don't always crash because of some syntax errors, but they still give not-so-great results)

Importantly if you haven't done it yet, add the stop word Observations: to a custom Modelfile, so that it will always stop when it needs to wait for results. Plus depending on the model it can help to play with the temperature parameter. Also I sometimes had some better results with little code tweaks, like the tasks or tools definitions being more explanatory (ie be more specific about the "Scrape website content" tool accepting only one URL)

Anyway I haven't tried this project with GPT4, but don't expect similar results with small models (yet!). Although I see you did test with Mixtral, I haven't been able to play with it yet but I expected it to be almost on par with GPT4..!

bonswouar avatar Jan 05 '24 16:01 bonswouar

@bonswouar Thanks for your reply. I did try starling-lm and ended up with an output. The result is as you described okayish. Do you have any special configuration for the modelfile or use the default settings described by the project owner e.g.: temperature 0, top-p 0.5 etc.? Im using the default settings as described above.

Im using 2xT4 GPUS and it does run mixtral but it takes very long and ends in syntax errors mostly eventhough being quite far. OpenHermes 2.5 (FP16) instantly crashes with keyerror "description" also GPT-4 API delivers the same result. I will probably try some RLAIF or RLHF models now on, after seeing that starling-lm worked.

abdinal1 avatar Jan 05 '24 19:01 abdinal1

I'm surprised GPT4 doesn't give you good results! Maybe it will be better in the next release, I believe some tweaks to improve the results are planned very soon. EDIT: What's the error you get exactly? Maybe there's an issue with the example or something

I mainly used default parameters in my Modelfiles, except temperature 0.5-0.7. You could also try nous-hermes-2-solar, someone shared an example Modelfile in the discord channel that seems to work as well as starling-lm (but I still need to precise the "Scrape website content" tool description to be sure it always inputs urls)

bonswouar avatar Jan 05 '24 23:01 bonswouar

Sorry for my newbee question, how do I set a stop word in the model file? E.g. when I use the small phi model, would the modelfile look as follows?

FROM phi

set the temperature to 1 [higher is more creative, lower is more coherent]

PARAMETER temperature 1

set the system message

SYSTEM """ ??? """

how to set the stop word?

Thanks for any hint, regards Tom

tblock-zz avatar Jan 08 '24 19:01 tblock-zz

I've found Mistral to work well for the README.md example code.

iplayfast avatar Jan 09 '24 23:01 iplayfast

@tblock-zz You shouldn't ask here, but on crewAI discord for example (or ollama) Anyway to answer your question, here's the Modelfile documentation https://github.com/jmorganca/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values Thus it should be PARAMETER stop "Observations:"

@iplayfast You could try a fine tuned version of mistral it might performs even better! I've noticed that nous-hermes-2-solar is really good at following general the process, delegating and using tools. But mistral-instruct, while still less good in general, seems to be better for some specific tasks (for example I have better results for scraping big websites). Pretty cool to be able to switch ollama model on the fly!

bonswouar avatar Jan 10 '24 11:01 bonswouar

Yup smaller opensource models struggle with more complex tasks, but we are working on a dataset we will use to launch new opensource agentic models that will perform way better then raw ones, we are currently started to experiment with it.

joaomdmoura avatar Jan 21 '24 03:01 joaomdmoura