crewAI More explicit prompting to help smaller models

Hi there. Prompts are at the core of crewAI's ability to orchestrate models to use tools correctly.

When testing with smaller models (In my case, variants of mistral, llama3 and phi3), often tool parameters were missing curly braces, or included into the tool name, resulting in multiple error sequences.

These minor changes to the phrasing should increase reliability.

They have NOT been tested on large models, though there is a good chance that large models are able to understand without being so explicit.

May 04 '24 13:05 diversity-co-uk

I'll need to run the benchmarks on this one, so might take a little longer to merge

May 05 '24 05:05 joaomdmoura

I tested this with mixtral-8x7b and the Action Input was now correct consistently where before it was more like 60% success rate. llama3-8b still failed to close the dictionary.

Action: navigate_to
Action Input: {"url": "http://google.com

May 08 '24 18:05 dkoontz

I tested this with mixtral-8x7b and the Action Input was now correct consistently where before it was more like 60% success rate. llama3-8b still failed to close the dictionary.
Action: navigate_to
Action Input: {"url": "http://google.com

I spoke too soon, with further testing I am still seeing tool use failures with mixtral:8x7b. The first tool use is now consistently correct but the subsequent steps are failing in a very consistent way by adding additional \ characters to the tool name. Here's an example of this:

 Thought: I have successfully navigated to the website. The next step is to find and click on the 'Sign In' button.

Action: click\_on\_element
Action Input: {"text": "Sign In"}

Action 'click\_on\_element' don't exist, these are the only available Actions: navigate_to: navigate_to(url: str) - Loads the web page specified by the 'url' argument. When this tool completes the web page will
    be loaded and it can now be searched and interacted with using other tools.
    Example: navigate_to({"url": "http://url"})
click_on_element: click_on_element(text: str) - Search for an element on the current page using the 'text' argument, then click on the element.
    Example: click_on_element({"text": "Next"})

May 08 '24 19:05 dkoontz

Thanks David.

The prompt style from phidata works quite well on small models. This would move away from crewai's more conversational style, so i didn't suggest it.

''' Provide your output as a JSON containing the following fields: <json_fields> ["listName", "steps"] </json_fields> Here are the properties for each field: <json_field_properties> { "listName": { "description": "The title of the list", "type": "string" }, "steps": { "description": "Steps", "items": { "type": "string" }, "type": "array" } } </json_field_properties> Start your response with { and end it with }. Your output will be passed to json.loads() to convert it to a Python object. Make sure it only contains valid JSON. ''' The results are almost always valid json, which may be enough for crew, but the requested object definition is often 'enhanced': Asking for List[str] can give list[dict[str:str, str:list[str]]] or list[dict[str:str]] or dict[str:str, str:list[dict[str:str, str:list[str]]], str:str]

The model is being clever and adding meaningful sublists or dicts in valid JSON style - not what was asked for but generally useful and creatively coercable.

May 09 '24 07:05 diversity-co-uk

Some further tests when using phidata style prompts. Here, a hacky squish method reduces arbitratrily nested lists and dicts into the desired list[str] before trying to parse into the requested object. Given that the object definition includes 'type' and 'description' fields, many models want to reply with a similar dict, rather than the 'string' specified by the 'type' field. I'll run these tests again with a List[dict[str:str]] return type, and I'd expect better results at parsing without having to squish.

When using small models, we may be condemned to barbary.

{'dolphin-llama3:8b': {'elapsed': 54.291745448112486, 'numTests': 10, 'valid_json': 1.0, 'valid_obj': 0.0, 'valid_obj_after_squish': 1.0}, 'dolphin-mistral:latest': {'elapsed': 9.661656284332276, 'numTests': 10, 'valid_json': 1.0, 'valid_obj': 0.9, 'valid_obj_after_squish': 1.0}, 'llama3:instruct': {'elapsed': 25.065267456902397, 'numTests': 9, 'valid_json': 1.0, 'valid_obj': 0.1111111111111111, 'valid_obj_after_squish': 0.8888888888888888}, 'mistral:latest': {'elapsed': 8.444426274299621, 'numTests': 10, 'valid_json': 1.0, 'valid_obj': 0.0, 'valid_obj_after_squish': 1.0}, 'phi3:instruct': {'elapsed': 9.940554523468018, 'numTests': 10, 'valid_json': 1.0, 'valid_obj': 0.0, 'valid_obj_after_squish': 1.0}, 'wizardlm2:7b': {'elapsed': 6.53355028629303, 'numTests': 10, 'valid_json': 1.0, 'valid_obj': 1.0, 'valid_obj_after_squish': 1.0}}

May 09 '24 09:05 diversity-co-uk

great news, in this new version we will add the opportunity for people to overwrite all the inner prompts, not saying we shouldn't benchmark this still, but something that will help with individual models

May 10 '24 19:05 joaomdmoura

This PR is stale because it has been open for 45 days with no activity.

Aug 17 '24 12:08 github-actions[bot]