crewAI
crewAI copied to clipboard
Llama3 model not generating/taking far too long to generate simple answer. Anyone else?
With the introduction of the Llama3 model, I wanted to start testing it out with CrewAI! I recreated the following simple program from the documentation (Side note, should add the now mandatory parameter of expected_output
to the Task
object in the documentation):
# Windows 10
# Python 3.11
# Device 0: NVIDIA GeForce GTX 1050 Ti, compute capability 6.1, VMM: yes
# 6 Core, 12 Thread AMD CPU
# 48 Gb of RAM
from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI
import os
os.environ["OPENAI_API_KEY"] = "NA"
llm = ChatOpenAI(
model = "crewai-llama3",
base_url = "http://localhost:11434/v1")
general_agent = Agent(role = "Math Professor",
goal = """Provide the solution to the students that are asking mathematical questions and give them the answer.""",
backstory = """You are an excellent math professor that likes to solve math questions in a way that everyone can understand your solution""",
allow_delegation = False,
verbose = True,
llm = llm)
task = Task (description="""what is 3 + 5""",
agent = general_agent,
expected_output = "The correct answer to my question"
)
crew = Crew(
agents=[general_agent],
tasks=[task],
verbose=2
)
result = crew.kickoff()
print(result)
.env file looks like this:
OPENAI_API_BASE='http://localhost:11434/v1'
OPENAI_MODEL_NAME='llama3' #'openhermes' # Adjust based on available model
OPENAI_API_KEY=''
Modelfile
FROM llama3
# Set parameters
PARAMETER temperature 0.2
PARAMETER stop Result
# Sets a custom system message to specify the behavior of the chat assistant
# Leaving it blank for now.
SYSTEM """"""
...and I ran this powershell script ahead of time:
# Variables
$model_name = "llama3"
$custom_model_name = "crewai-llama3"
$modelfile_path = "I:\nasty\Python_Projects\LLM\CrewAI\Modelfile"
# Get the base model
ollama pull $model_name
# Create the model file
ollama create $custom_model_name -f "$modelfile_path"
Running the code, I don't get a generated answer after 15 minutes of running it:
(GPU isn't being fully utilized compared to the CPU, but leaving that for now).
When I run the llama2
model however, by changing all the necessary variables from llama3
to llama2
in all files, I get the answer in seconds:
Is this really because the llama3 model is so much bigger than llama2, or does anyone else have this issue yet?
Hi there! nice debugging. I have also tried llama3 without good results, but it works ok for me in terms of speed (rtx 3090).
Just a quick thing, I don't think that is in your case, in your .env
, you have:
OPENAI_MODEL_NAME='llama3' #'openhermes' # Adjust based on available model
instead of crewai-llama3
Thanks for reaching out!
Changing the variables to llama3
from/to crewai-llama3
results in the same thing. I think we're just calling the llama3 model crewai-llama3
for easier model name separation, but issue persists. My GPU isn't the greatest so that's likely the issue, however with such a big difference between the llama2 and llama3 model it sure seems suspicious...
Maybe you could use the system manager (I don't recall the name, but the thing that appears on win11 when you ctrl-alt-del) and there checking the vram usage comparing llama2 and llama3.
I checked the RAM usage and they are both the same when using Llama2 and Llama3. Llama3 screenshot of the task manager was in the original posting, and here is what Llama2 looks like at runtime:
Pretty similar, I just get an answer wayyyy faster with Llama2 than 3. I'll leave this here in case others have issues with similar hardware, but thanks for the suggestions so far!
Did you ever get any response back from the llama3 model setup? I'm having a similar experience, only I've never been patient enough to get an answer if one was coming.
I'm using the langchain community Ollama library, but everything else is the same:
from crewai import Agent, Task, Crew, Process
llm = Ollama(model= 'llama3')
Whether with the base model, or one with the ModelFile modification applied, I spin forever after printing "> Entering new CrewAgentExecutor chain..." to the terminal. I did get it to work once with llama2 (plus ModelFile), but tried again and am now having similar problems there and have been unable to reproduce my success.
When I CTRL+C kill the script, the stack trace shows it is always stuck waiting on line 186 of agents/executor.py
waiting for a call to self.agent.plan
to return. Adding a print statement, I can see that the first argument intermediate_steps
passed into agent.plan is an empty array, but am not familiar enough with the base AgentExecutor
class to know if that is a problem or not.
As a final datapoint, in another simple test script I am able to interact with the llama3 model and it is fast and responsive, both to blocking invoke or streaming interactions.
llm = Ollama(model= 'llama3')
query = "Tell me a joke"
response = llm.invoke(query)
print(response)
for chunks in llm.stream(query):
print(chunks)
so we just need to wait for the dev team to fix it for running llama3?
Also completely possible I have something borked with my setup somehow since I am pretty new to all this. If anybody has it working with llama3 and can post how they set it up that would be great.
I think I found a solution that is working for me and wanted to post and update.
By using the 'dolphin-llama3' model which is also available through Ollama, and applying the ModelFile with the extra parameters (targetted at dolphin-llama3), I appear to be getting crewai fired up and returning reliably on at least trivial local example.
Quick update, I got it working using the langchain_community chat model setting the num_predict
param.
As per https://github.com/ollama/ollama/issues/3760
from langchain_community.chat_models.ollama import ChatOllama
ChatOllama(model="llama3", temperature=0.7, num_predict=128)
It works with the default llama3
or crewai-llama3
models as far as I can see but both models keeps adding "<|eot_id|><|start_header_id|>assistant<|end_header_id|>" to the answer.
Update: works on langchain_community Ollama llm as well.
Not sure what the issue is with the ollama llama3 model and CrewAI, it seems to have a hard time stopping generation.
- Llama3 and Llama2 work correctly from Ollama CLI
- Llama2 works correctly in CrewAI with modelfile as per CrewAI documentation i.e.
Parameter stop Result
- Llama3 does not work correctly in CrewAI with
parameter stop Result
- I also tried using the Ollama modelfile in CrewAI but it does not work:
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama3:latest
FROM llama3:latest
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER num_keep 24
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.