More explicit prompting to help smaller models
Hi there. Prompts are at the core of crewAI's ability to orchestrate models to use tools correctly.
When testing with smaller models (In my case, variants of mistral, llama3 and phi3), often tool parameters were missing curly braces, or included into the tool name, resulting in multiple error sequences.
These minor changes to the phrasing should increase reliability.
They have NOT been tested on large models, though there is a good chance that large models are able to understand without being so explicit.
I'll need to run the benchmarks on this one, so might take a little longer to merge
I tested this with mixtral-8x7b and the Action Input was now correct consistently where before it was more like 60% success rate. llama3-8b still failed to close the dictionary.
Action: navigate_to
Action Input: {"url": "http://google.com
I tested this with mixtral-8x7b and the Action Input was now correct consistently where before it was more like 60% success rate. llama3-8b still failed to close the dictionary.
Action: navigate_to Action Input: {"url": "http://google.com
I spoke too soon, with further testing I am still seeing tool use failures with mixtral:8x7b. The first tool use is now consistently correct but the subsequent steps are failing in a very consistent way by adding additional \ characters to the tool name. Here's an example of this:
Thought: I have successfully navigated to the website. The next step is to find and click on the 'Sign In' button.
Action: click\_on\_element
Action Input: {"text": "Sign In"}
Action 'click\_on\_element' don't exist, these are the only available Actions: navigate_to: navigate_to(url: str) - Loads the web page specified by the 'url' argument. When this tool completes the web page will
be loaded and it can now be searched and interacted with using other tools.
Example: navigate_to({"url": "http://url"})
click_on_element: click_on_element(text: str) - Search for an element on the current page using the 'text' argument, then click on the element.
Example: click_on_element({"text": "Next"})
Thanks David.
The prompt style from phidata works quite well on small models. This would move away from crewai's more conversational style, so i didn't suggest it.
'''
Provide your output as a JSON containing the following fields:
<json_fields>
["listName", "steps"]
</json_fields>
Here are the properties for each field:
<json_field_properties>
{
"listName": {
"description": "The title of the list",
"type": "string"
},
"steps": {
"description": "Steps",
"items": {
"type": "string"
},
"type": "array"
}
}
</json_field_properties>
Start your response with { and end it with }.
Your output will be passed to json.loads() to convert it to a Python object.
Make sure it only contains valid JSON.
'''
The results are almost always valid json, which may be enough for crew, but the requested object definition is often 'enhanced':
Asking for List[str] can give list[dict[str:str, str:list[str]]] or list[dict[str:str]] or dict[str:str, str:list[dict[str:str, str:list[str]]], str:str]
The model is being clever and adding meaningful sublists or dicts in valid JSON style - not what was asked for but generally useful and creatively coercable.
Some further tests when using phidata style prompts. Here, a hacky squish method reduces arbitratrily nested lists and dicts into the desired list[str] before trying to parse into the requested object. Given that the object definition includes 'type' and 'description' fields, many models want to reply with a similar dict, rather than the 'string' specified by the 'type' field. I'll run these tests again with a List[dict[str:str]] return type, and I'd expect better results at parsing without having to squish.
When using small models, we may be condemned to barbary.
{'dolphin-llama3:8b': {'elapsed': 54.291745448112486, 'numTests': 10, 'valid_json': 1.0, 'valid_obj': 0.0, 'valid_obj_after_squish': 1.0}, 'dolphin-mistral:latest': {'elapsed': 9.661656284332276, 'numTests': 10, 'valid_json': 1.0, 'valid_obj': 0.9, 'valid_obj_after_squish': 1.0}, 'llama3:instruct': {'elapsed': 25.065267456902397, 'numTests': 9, 'valid_json': 1.0, 'valid_obj': 0.1111111111111111, 'valid_obj_after_squish': 0.8888888888888888}, 'mistral:latest': {'elapsed': 8.444426274299621, 'numTests': 10, 'valid_json': 1.0, 'valid_obj': 0.0, 'valid_obj_after_squish': 1.0}, 'phi3:instruct': {'elapsed': 9.940554523468018, 'numTests': 10, 'valid_json': 1.0, 'valid_obj': 0.0, 'valid_obj_after_squish': 1.0}, 'wizardlm2:7b': {'elapsed': 6.53355028629303, 'numTests': 10, 'valid_json': 1.0, 'valid_obj': 1.0, 'valid_obj_after_squish': 1.0}}
great news, in this new version we will add the opportunity for people to overwrite all the inner prompts, not saying we shouldn't benchmark this still, but something that will help with individual models
This PR is stale because it has been open for 45 days with no activity.