crewAI Llama3 model not generating/taking far too long to generate simple answer. Anyone else?

With the introduction of the Llama3 model, I wanted to start testing it out with CrewAI! I recreated the following simple program from the documentation (Side note, should add the now mandatory parameter of expected_output to the Task object in the documentation):

# Windows 10
# Python 3.11
# Device 0: NVIDIA GeForce GTX 1050 Ti, compute capability 6.1, VMM: yes
# 6 Core, 12 Thread AMD CPU
# 48 Gb of RAM

from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI
import os
os.environ["OPENAI_API_KEY"] = "NA"

llm = ChatOpenAI(
    model = "crewai-llama3",
    base_url = "http://localhost:11434/v1")

general_agent = Agent(role = "Math Professor",
                      goal = """Provide the solution to the students that are asking mathematical questions and give them the answer.""",
                      backstory = """You are an excellent math professor that likes to solve math questions in a way that everyone can understand your solution""",
                      allow_delegation = False,
                      verbose = True,
                      llm = llm)
task = Task (description="""what is 3 + 5""",
             agent = general_agent,
             expected_output = "The correct answer to my question"
             )

crew = Crew(
            agents=[general_agent],
            tasks=[task],
            verbose=2
        )

result = crew.kickoff()

print(result)

.env file looks like this:

OPENAI_API_BASE='http://localhost:11434/v1'
OPENAI_MODEL_NAME='llama3' #'openhermes'  # Adjust based on available model
OPENAI_API_KEY=''

Modelfile

FROM llama3

# Set parameters

PARAMETER temperature 0.2
PARAMETER stop Result

# Sets a custom system message to specify the behavior of the chat assistant

# Leaving it blank for now.

SYSTEM """"""

...and I ran this powershell script ahead of time:

# Variables
$model_name = "llama3"
$custom_model_name = "crewai-llama3"
$modelfile_path = "I:\nasty\Python_Projects\LLM\CrewAI\Modelfile"

# Get the base model
ollama pull $model_name

# Create the model file
ollama create $custom_model_name -f "$modelfile_path"

Running the code, I don't get a generated answer after 15 minutes of running it:

(GPU isn't being fully utilized compared to the CPU, but leaving that for now).

When I run the llama2 model however, by changing all the necessary variables from llama3 to llama2 in all files, I get the answer in seconds:

Is this really because the llama3 model is so much bigger than llama2, or does anyone else have this issue yet?

Apr 24 '24 03:04 windowshopr

Hi there! nice debugging. I have also tried llama3 without good results, but it works ok for me in terms of speed (rtx 3090).

Just a quick thing, I don't think that is in your case, in your .env, you have:

OPENAI_MODEL_NAME='llama3' #'openhermes'  # Adjust based on available model

instead of crewai-llama3

Apr 24 '24 14:04 kyuumeitai

Thanks for reaching out!

Changing the variables to llama3 from/to crewai-llama3 results in the same thing. I think we're just calling the llama3 model crewai-llama3 for easier model name separation, but issue persists. My GPU isn't the greatest so that's likely the issue, however with such a big difference between the llama2 and llama3 model it sure seems suspicious...

Apr 24 '24 15:04 windowshopr

Maybe you could use the system manager (I don't recall the name, but the thing that appears on win11 when you ctrl-alt-del) and there checking the vram usage comparing llama2 and llama3.

Apr 24 '24 15:04 kyuumeitai

I checked the RAM usage and they are both the same when using Llama2 and Llama3. Llama3 screenshot of the task manager was in the original posting, and here is what Llama2 looks like at runtime:

Pretty similar, I just get an answer wayyyy faster with Llama2 than 3. I'll leave this here in case others have issues with similar hardware, but thanks for the suggestions so far!

Apr 24 '24 17:04 windowshopr

Did you ever get any response back from the llama3 model setup? I'm having a similar experience, only I've never been patient enough to get an answer if one was coming.

I'm using the langchain community Ollama library, but everything else is the same:

from crewai import Agent, Task, Crew, Process
 
llm = Ollama(model= 'llama3')

Whether with the base model, or one with the ModelFile modification applied, I spin forever after printing "> Entering new CrewAgentExecutor chain..." to the terminal. I did get it to work once with llama2 (plus ModelFile), but tried again and am now having similar problems there and have been unable to reproduce my success.

When I CTRL+C kill the script, the stack trace shows it is always stuck waiting on line 186 of agents/executor.py waiting for a call to self.agent.plan to return. Adding a print statement, I can see that the first argument intermediate_steps passed into agent.plan is an empty array, but am not familiar enough with the base AgentExecutor class to know if that is a problem or not.

As a final datapoint, in another simple test script I am able to interact with the llama3 model and it is fast and responsive, both to blocking invoke or streaming interactions.


llm = Ollama(model= 'llama3')

query = "Tell me a joke"

response = llm.invoke(query)
print(response)

for chunks in llm.stream(query):
    print(chunks)

Apr 25 '24 01:04 JeremyJass

so we just need to wait for the dev team to fix it for running llama3?

Apr 25 '24 05:04 thanayut1750

Also completely possible I have something borked with my setup somehow since I am pretty new to all this. If anybody has it working with llama3 and can post how they set it up that would be great.

Apr 25 '24 10:04 JeremyJass

I think I found a solution that is working for me and wanted to post and update.

By using the 'dolphin-llama3' model which is also available through Ollama, and applying the ModelFile with the extra parameters (targetted at dolphin-llama3), I appear to be getting crewai fired up and returning reliably on at least trivial local example.

Apr 26 '24 01:04 JeremyJass

Quick update, I got it working using the langchain_community chat model setting the num_predict param.

As per https://github.com/ollama/ollama/issues/3760

from langchain_community.chat_models.ollama import ChatOllama

ChatOllama(model="llama3", temperature=0.7, num_predict=128)

Update: works on langchain_community Ollama llm as well.

Apr 26 '24 10:04 jacoverster

Not sure what the issue is with the ollama llama3 model and CrewAI, it seems to have a hard time stopping generation.

Llama3 and Llama2 work correctly from Ollama CLI
Llama2 works correctly in CrewAI with modelfile as per CrewAI documentation i.e. Parameter stop Result
Llama3 does not work correctly in CrewAI with parameter stop Result
I also tried using the Ollama modelfile in CrewAI but it does not work:

# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama3:latest

FROM llama3:latest
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""
PARAMETER num_keep 24
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"

May 04 '24 12:05 danielgen

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Aug 17 '24 12:08 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Aug 22 '24 12:08 github-actions[bot]

crewAI crewAI copied to clipboard

Llama3 model not generating/taking far too long to generate simple answer. Anyone else?

crewAI
crewAI copied to clipboard