crewAI Invalid Format: Missing 'Action:' after 'Thought

trafficstars

My agent keep running into this error whenever I use any of the models locally (I tried llama2, openhermes, starling and Mistral). The only model that didn't run into this problem is Mistral.

Very often this error is followed by another error: "Error executing tool. Missing exact 3 pipe (|) separated values. For example, coworker|task|information."

Whenever any of these two errors appeared, I wouldn't be able to get any valid output. I was experimenting with simple examples like internet scraping with DuckDuckGo and custom reddit scraping tool. Also, worth mentioning, I don't have these problems when I use openai.

Jan 10 '24 13:01 majacinka

I'm experiencing the same issues with a few models too. Mostly llama2 and openhermes. It appears the local model seems to "lose track" of what it's supposed to be doing and providing no action after the thought, but I could be wrong.

Jan 10 '24 14:01 ChaseRichardsonGit

I tried running a few 13B models - Llama 2 and Vicuna. I assumed that the bigger model = better results but that wasn't the case. I think that "losing track" is a right way to describe the issue. It looks like local model totally forgets about all the prompts and starts looping.

Jan 12 '24 04:01 majacinka

same problem with phi-2

Jan 12 '24 08:01 lxkaka

I'm having the same problem with Mistral and Openhermes. CrewAI stops with the following output: Task output: Agent stopped due to iteration limit or time limit.

Jan 13 '24 14:01 Henry-Brinkman

It seems to work with OpenChat though!

Jan 13 '24 14:01 Henry-Brinkman

Happening to me too. OpenAI's APIs are running fine but running with local Ollama models fails after a certain point with that exact error.

Jan 14 '24 15:01 mntolia

It seems to work with OpenChat though!

I tried the Stock Analysis example with OpenChat and it goes off the rails pretty quickly. Any suggestions?

Jan 14 '24 15:01 ChaseRichardsonGit

It seems to work with OpenChat though!

I tried the Stock Analysis example with OpenChat and it goes off the rails pretty quickly. Any suggestions?

I tried Mixtral and its the only one that does work with crewai bit consistent when compared to others, even having agents for just code generation with codellama failed so to my knowledge only 8X7b works 5/10

Jan 15 '24 12:01 badboysm890

It seems to work with OpenChat though!

I tried the Stock Analysis example with OpenChat and it goes off the rails pretty quickly. Any suggestions?

Have you tried the new instragram_post example? That seems to work for me.

Jan 15 '24 15:01 Henry-Brinkman

I've modified my model to handle num_ctx=16384 (running on a RTX 3090), no issues since.

Jan 16 '24 01:01 kyuumeitai

It seems to work with OpenChat though!

I tried the Stock Analysis example with OpenChat and it goes off the rails pretty quickly. Any suggestions?

Have you tried the new instragram_post example? That seems to work for me.

I have not tried the instagram_post example as I have no use for it. I'm very interested in the stock analysis agent, though I still haven't had any success getting it to work well with a local model.

Jan 16 '24 13:01 ChaseRichardsonGit

I've modified my model to handle num_ctx=16384 (running on a RTX 3090), no issues since.

Which model are you running?

Jan 16 '24 13:01 ChaseRichardsonGit

Hi everyone, thank you all for replying and sharing your experiences. I wanted to share my observations and maybe somebody might find them helpful and save up some time.

Over the last 10 days, I've experimented with 15 different models. My laptop has 16 GB RAM, my goal for my agents was to scrape data from a particular subreddit and to turn that data into simple, short newsletter written in layman words.

Of those 15 models, only 2 were able to accomplish the task: GPT4 and Llama 2 13B (base model).

Models I've played with that have failed were:

Gemini Pro
Mistral 7B
Mistral 7B instruct
phi-2
Open Chat 3.5 7B
Nous Hermes 7B
Open Hermes 2.5 7B
Starling 7B
Llama 2 13B chat
Llama 2 13B text
Llama 2 7B
Llama 2 7B text
Llama 2 7B chat

I have tried to tweak my prompts, I've played with modelfile by setting all kinds of parameters, but the only conclusion that I came up with is: more parameters = more reasoning.

The reason why agents failed is because they either:

didn't understand that they need to use the scraping tool and would instead use their training data to write the newsletter OR
they would scrape the data and instead of writing the newsletter, they would start reacting to the scraped data. e.g. if the scraped data mentions a new python library, agents would totally forget about the newsletter and would try to write a python script

I have one more theory but I can't test it due to insufficient RAM my laptop. I wonder if models with 7B p but context window of 16K tokens would be able to perform the task. In other words, would bigger context window = more reasoning?

Jan 16 '24 15:01 majacinka

Hello, I have experienced the same issue with Openhermes before, but since I configured the temperture to 0.1, it works great.

I was having looping problem before as well, but with Gemini Pro, with temperature at 0.6, all issues gone.

Jan 16 '24 16:01 wingchiutong

Hey folks, finally catching up to this!

Indeed smaller models do struggle with certain tools, specially more complex ones, I think there is room for us to optimize the crew prompts a bit maybe, I'll look into that, but in the end of the day smaller models do struggle with cognition.

I'm collecting data to fine tune these models into agentic models that will be trained to behave more like agents, this should provide way more reliability in even small models.

I think a good next action here might be to mention the best models on our new docs, and doing some test on slightly changing the prompts for smaller models, I'll take a look at that, meanwhile I'm closing this one, but open to re-open if there are requests :)

Jan 21 '24 15:01 joaomdmoura

I had success running a simple crew with one function. Benchmarks of the different models and if they worked with function calling is below. Hopefully, this helps someone! All testing was done using LM Studio as the API Server.

Model Benchmarks

Jan 21 '24 16:01 ChaseRichardsonGit

I've modified my model to handle num_ctx=16384 (running on a RTX 3090), no issues since.

Which model are you running?

sorry, I didn't check my email bc vacations.

OpenHermes.

Jan 22 '24 13:01 kyuumeitai

I found another cause of this bug. When the task prompt is too strong, it changes some important (internal) keywords, like

Action:
Thought:
Action Input:

The agent will fail to parse the text.

For example, I have a task to ask the agent to use a markdown header to denote headers. It transformed Action: into **Action:**

To resolve this, I add the following to all of my task prompts:

These keywords must never be translated and transformed:
- Action:
- Thought:
- Action Input:
because they are part of the thinking process instead of the output.

Jan 24 '24 19:01 kingychiu

@kingychiu that worked. I'm running TheBloke/dolphin-2.2.1-mistral-7B-GGUF on LMStudio.

Jan 25 '24 23:01 owaisafaq

With the @kingychiu hack, I've got Error executing tool. Missing exact 3 pipe (|) separated values. I had to add Action Input should be formatted as coworker|task|context.

allow_delegation=True,
llm=Ollama(model="codellama:34b")

Feb 11 '24 10:02 jeanjerome

With the @kingychiu hack, I've got Error executing tool. Missing exact 3 pipe (|) separated values. I had to add Action Input should be formatted as coworker|task|context.
allow_delegation=True,
llm=Ollama(model="codellama:34b")

I think these are other internal keywords being overwritten? It is interesting and concerning that the task can hack the thinking system, haha.

Feb 11 '24 17:02 kingychiu

@kingychiu - ad you added this as part of the Task?

Feb 17 '24 19:02 AssetDK

@kingychiu - ad you added this as part of the Task?

Yes. In my task, I was asking the agent to output the result in markdown format with headers. Then to fix this issue, I had to ask the agent to NOT change the format of some keywords.

Feb 21 '24 05:02 kingychiu

I found another cause of this bug. When the task prompt is too strong, it changes some important (internal) keywords, like

Action:

Thought:

Action Input:

The agent will fail to parse the text.

For example, I have a task to ask the agent to use a markdown header to denote headers. It transformed Action: into **Action:**

To resolve this, I add the following to all of my task prompts:
These keywords must never be translated and transformed:
- Action:
- Thought:
- Action Input:
because they are part of the thinking process instead of the output.

@kingychiu could you actually please tell where exactly did you changed the above , In my case model is failed to follow the instructions so it never returns the output into that format which agent tool expects

Feb 21 '24 13:02 jaideep11061982

I am searching the issues after having a problem in esentially the same spot and had a slightly different idea. This is what I am seeing:


> Entering new CrewAgentExecutor chain...
Thought: I need to find the latest news about Columbia President Minouche Shafik.

Action: Search the internet

Action Input: {'search_query': 'Columbia University President Minouche ShafikI apologize for the mistake. Let me retry with a correct Action Input.

Thought: I need to find the latest news about Columbia President Minouche Shafik.
Action: Search the internet
Action Input: {"search_query": "Minouche Shafik Columbia University^CTraceback (most recent call last):

what i think is happening around "Action Input" is that the quotes or the close curly brackets are getting escaped incorrectly by the less powerful (i.e. open sourced) models. In looking for places that might cause this sort of failure I found these three locations in the codebase that might be at issue:

https://github.com/joaomdmoura/crewAI/blob/main/src/crewai/tools/tool_output_parser.py

https://github.com/joaomdmoura/crewAI/blob/main/src/crewai/agents/parser.py

or more specifically here:

https://github.com/joaomdmoura/crewAI/blob/main/src/crewai/agents/parser.py#L43

an invalid usage of regex might be the cause of these many issues. ChatGPT suggested that this regex might work better:

regex = (
    r"Action\s*\d*\s*:[\s]*(.*?)[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
)

Apr 29 '24 12:04 contractorwolf

I can see the regex is now updated to the suggested version. I am still seeing this behaviour with a custom tool to execute some code.

I printed text and action_input from parser.py and realised that like in the example above for some reason "} was getting cut-off so I simply added this:

tool_input = tool_input.strip('"').strip('`') #also added this as my model (llama3) was adding them at times
tool_input += '"}'

Now it works, I don't know why!

May 09 '24 15:05 danielgen

num_ctx=16384

@danielgen could you post a code sample showing exactly how you are using this? i am using a 3090 as well. thx

May 12 '24 18:05 contractorwolf

@contractorwolf you tagged me by mistake, you meant to tag kyuumeitai based on your quote

May 15 '24 11:05 danielgen

num_ctx=16384

@danielgen could you post a code sample showing exactly how you are using this? i am using a 3090 as well. thx

Hey there, I did it in a modelfile with Ollama, in fact, I did it in the Open WebUI hehe, but you can do it with commands

In the end I give up on crewAI for my project, right now I'm using a combination of n8n and flowise in a more manual, not so magical way.

May 15 '24 15:05 kyuumeitai

crewAI crewAI copied to clipboard

Invalid Format: Missing 'Action:' after 'Thought

crewAI
crewAI copied to clipboard