magentic Ollama tool calls / structured output via LiteLLM are unreliable

Follow-on from issue https://github.com/jackmpcollins/magentic/issues/194

Ollama models (via LiteLLM) are returning incorrect function name in tool call output, which leads to magentic failing to parse this.

from magentic import prompt
from magentic.chat_model.litellm_chat_model import LitellmChatModel
from pydantic import BaseModel, Field


class Superhero(BaseModel):
    name: str
    age: int = Field(description="The age of the hero, could be very old.")
    power: str = Field(examples=["Runs really fast"])
    enemies: list[str]


@prompt(
    "Create a Superhero named {name}. Use the return_superhero function. Make sure to use the correct function name.",
    model=LitellmChatModel("ollama_chat/llama3", api_base="http://localhost:11434")
)
def create_superhero(name: str) -> Superhero: ...

create_superhero("Garden Man")

ValueError: Unknown tool call: {"id":"call_4ca84210-3b30-4cd6-a109-05044d703923","function":{"arguments":"{\"Garden Man\": {\"Name\": \"Garden Man\", \"Age\": 35, \"Power\": \"Can control plants and make them grow at an incredible rate\", \"Enemies\": [\"Pest Control\", \"Weed Killer\"]}}","name":"return_ super hero"},"type":"function","index":0}

I've tried a few variations of the prompt using llama3 to get it to use the correct function name but it basically never gets this right.

In this simple case (one return type, no functions) we could patch over this by ignoring the name / assuming the output is for the return_superhero function, but that would not work for the more general case of multiple return types or functions.

The ultimate solution will require better support for tool calls from ollama and litellm. llama.cpp supports tool calls in their python client https://github.com/abetlen/llama-cpp-python#function-calling but this is not currently exposed in ollama's OpenAI-compatible API https://github.com/ollama/ollama/blob/main/docs/openai.md . I have opened a new github issue with Ollama for this https://github.com/ollama/ollama/issues/4386 . After that, LiteLLM will also require an update to make use of this.

May 12 '24 23:05 jackmpcollins

Tools are now supported by Ollama and they have an openai-compatible API so it should be possible to use Ollama via OpenaiChatModel by setting the base_url.

https://ollama.com/blog/tool-support

EDIT: Need to wait for streamed tool calls support for this to work with magentic. Added tests in PR https://github.com/jackmpcollins/magentic/pull/281 which will pass for this

Jul 28 '24 23:07 jackmpcollins

Relevant Ollama github issues

https://github.com/ollama/ollama/issues/5796
https://github.com/ollama/ollama/issues/5989
https://github.com/ollama/ollama/issues/5993

Aug 20 '24 06:08 jackmpcollins

@jackmpcollins, thank you very much for the update on this.

I’ve been trying to get it working with an open-source/local model, and I’ve been following your discussions across different issues. However, I’m encountering the following error:

Testing LLaMA (Litellm) Model...
--- OLLaMa with str ---
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Response from LLaMA:
```json
{"subquestions": [
    {
        "id": 1,
        "question": "What is the most recent stock price quote for TSLA?",
        "depends_on": []
    }
]}

--- OLLaMa with SubQuestion --- 
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions “HTTP/1.1 200 OK”
ERROR:main:Error with LLaMA model: A string was returned by the LLM but was not an allowed output type. Consider updating the prompt to encourage the LLM to “use the tool”. Model output: ‘{“name”: “return_list_of_subquestion”, “parameters”: {“value”: “[\“What is the ticker symbol […]’

As you can see, I created two methods: one called generate_subquestions_from_query_with_str to check if the model is responding appropriately - which worked successfully. And another named generate_subquestions_from_query to test if the model can convert the response to list[SubQuestion], which failed.

Note: I have tested the following models, and all exhibit the same issue: firefunction-v2:latest, mistral:latest, llama3.1:latest, llama3-groq-tool-use:latest, and llama3.1:70b.

Do you have any ideas on how to resolve this?

Here’s the relevant part of my code:

from magentic.chat_model.retry_chat_model import RetryChatModel
from pydantic import BaseModel, Field, ValidationError
from typing import List
from magentic import (
    OpenaiChatModel,
    UserMessage,
    chatprompt,
    SystemMessage,
    prompt_chain,
)
import logging
import json

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class SubQuestion(BaseModel):
    id: int = Field(description="The unique ID of the subquestion.")
    question: str = Field(description="The subquestion itself.")
    depends_on: List[int] = Field(
        description="The list of subquestion IDs whose answer is required to answer this subquestion.",
        default_factory=list,
    )


@chatprompt(
    SystemMessage(GENERATE_SUBQUESTION_SYSTEM_PROMPT_TEMPLATE),
    UserMessage("# User query\n{user_query}"),
    model=OpenaiChatModel(
        model="llama3.1:70b",
        api_key="ollama",
        base_url="http://localhost:11434/v1/",
    ),
)
def generate_subquestions_from_query(user_query: str) -> list[SubQuestion]: ...


@chatprompt(
    SystemMessage(GENERATE_SUBQUESTION_SYSTEM_PROMPT_TEMPLATE),
    UserMessage("# User query\n{user_query}"),
    model=OpenaiChatModel(
        model="llama3.1:70b",
        api_key="ollama",
        base_url="http://localhost:11434/v1/",
    ),
)
def generate_subquestions_from_query_with_str(user_query: str) -> str: ...


def test_llama_model():
    print("Testing LLaMA (Litellm) Model...")
    user_query = "What is the current stock price of TSLA?"
    try:
        print("--- OLLaMa with str ---")
        response = generate_subquestions_from_query_with_str(user_query)
        print(f"Response from LLaMA:")
        print(response)
        print("------------")

        print("--- OLLaMa with SubQuestion ---")
        response = generate_subquestions_from_query(user_query)
        print(f"Response from LLaMA:")
        print(response)
        print("------------")
    except Exception as e:
        logger.error(f"Error with LLaMA model: {e}")


# Run tests
test_llama_model()

Click to expand the GENERATE_SUBQUESTION_SYSTEM_PROMPT_TEMPLATE

GENERATE_SUBQUESTION_SYSTEM_PROMPT_TEMPLATE = """\

Don't generate any comments, or "Notes". Return only the JSON with Markdown.

You are a world-class state-of-the-art agent called OpenBB Agent.

Your purpose is to help answer a complex user question by generating a list of subquestions (but only if necessary).

You must also specify the dependencies between subquestions, since sometimes one subquestion will require the outcome of another in order to fully answer.

## Guidelines
* Don't try to be too clever
* Assume Subquestions are answerable by a downstream agent using tools to lookup the information.
* You must generate at least 1 subquestion.
* Generate only the subquestions required to answer the user's question
* Generate as few subquestions as possible required to answer the user's question
* A subquestion may not depend on a subquestion that proceeds it (i.e. comes after it.)
* Assume tools can be used to look-up the answer to the subquestions (e.g., for market cap, just create a subquestion asking for the market cap rather than for the components to calculate it.)

### Example output
```json
{{"subquestions": [
    {{
        "id": 1,
        "question": "What are the latest financial statements of AMZN?",
        "depends_on": []
    }},
    {{
        "id": 2,
        "question": "What is the most recent revenue and profit margin of AMZN?",
        "depends_on": []
    }},
    {{
        "id": 3,
        "question": "What is the current price to earnings (P/E) ratio of AMZN?",
        "depends_on": []
    }},
    {{
        "id": 4,
        "question": "Who are the peers of AMZN?",
        "depends_on": []
    }},
    {{
        "id": 5,
        "question": "Which of AMZN's peers have the largest market cap?",
        "depends_on": [4]
    }}
]}}
“””
</details>

Oct 21 '24 08:10 igor17400

Hi @igor17400 , the issue is that ollama does not currently parse the tool calls from the streamed response. So the model output in that error log

‘{“name”: “return_list_of_subquestion”, “parame...

is in the text content part of the request, but should instead be in the tool calls part. The relevant ollama issue from what I can see is https://github.com/ollama/ollama/issues/5796 . And there is a PR that looks promising https://github.com/ollama/ollama/pull/6452 .

Magentic uses streamed responses internally in OpenaiChatModel. I have created issue https://github.com/jackmpcollins/magentic/issues/353 to add the option to use non-streamed responses which would be a workaround in this case.

Another option would be to try use ollama via litellm. I just tested this using the example in the description of this issue and ran into an error - litellm issue https://github.com/BerriAI/litellm/issues/6135.

So there is no simple workaround at the moment, but if any of the above issues get resolved it should allow you to use local models with structured output / tools.

Oct 22 '24 06:10 jackmpcollins

Hi all - for info, the ollama bug related to tool calls has been closed. Having installed the new release I don't see it having fixed the issue with magentic calls, but I'm not familiar with whether changes to magentic were also required?

https://github.com/ollama/ollama/issues/5796 https://github.com/ollama/ollama/releases/tag/v0.4.6

Nov 28 '24 08:11 benwhalley

@benwhalley @igor17400 Ollama works now with magentic https://github.com/jackmpcollins/magentic/releases/tag/v0.33.0 just released, via OpenaiChatModel! Depending on the model you choose it might have trouble adhering to the function schema so I recommend using the retries feature or doing some prompt engineering in those cases - possibly using chatprompt to give examples. Please let me know if you encounter any issues.

from magentic import chatprompt, AssistantMessage, OpenaiChatModel, UserMessage


@chatprompt(
    UserMessage("Return a list of fruits."),
    AssistantMessage(["apple", "banana", "cherry"]),
    UserMessage("Return a list of {category}."),
    model=OpenaiChatModel("llama3.1", base_url="http://localhost:11434/v1/"),
)
def make_list(category: str) -> list[str]: ...


print(make_list("colors"))
#> ['red', 'green', 'blue']

Nov 29 '24 08:11 jackmpcollins

@jackmpcollins love it Does it support system prompt too?

Dec 04 '24 17:12 piiq

@piiq Should do! Using SystemMessage. https://magentic.dev/chat-prompting/

Dec 04 '24 17:12 jackmpcollins

@benwhalley @igor17400 Ollama works now with magentic https://github.com/jackmpcollins/magentic/releases/tag/v0.33.0 just released, via OpenaiChatModel! Depending on the model you choose it might have trouble adhering to the function schema so I recommend using the retries feature or doing some prompt engineering in those cases - possibly using chatprompt to give examples. Please let me know if you encounter any issues.
from magentic import chatprompt, AssistantMessage, OpenaiChatModel, UserMessage


@chatprompt(
    UserMessage("Return a list of fruits."),
    AssistantMessage(["apple", "banana", "cherry"]),
    UserMessage("Return a list of {category}."),
    model=OpenaiChatModel("llama3.1", base_url="http://localhost:11434/v1/"),
)
def make_list(category: str) -> list[str]: ...


print(make_list("colors"))
#> ['red', 'green', 'blue']

Thanks so much for your work on this @jackmpcollins . I ran your code example from the above reply and get this: magentic.chat_model.base.StringNotAllowedError: A string was returned by the LLM but is not an allowed output type. Consider updating the allowed output types or modifying the prompt. Model output: '{"name": "return_list_of_str", "parameters": {"value": ["red", "green", "blue"]}}' - this is on magentic version 0.34.1.

Also, despite seeing it lasted by Ollama as supporting tools, I initially tried with qwen2.5-coder:7b-base and got the error openai.BadRequestError: Error code: 400 - {'error': {'message': 'qwen2.5-coder:7b-base does not support tools', 'type': 'api_error', 'param': None, 'code': None}} (does it require a different variant?)

However, by copying your example and making it simpler (just a str) it works:

@chatprompt(
    UserMessage("Re turn a random color"),
    AssistantMessage("red"),
    UserMessage("Return a random {category}"),
    model=OpenaiChatModel("llama3.1", base_url="http://localhost:11434/v1/"),
)
def make_random(category: str) -> str: ...

print(make_random('vehicle'))

output: Mitsubishi Pajero Sport

If it matters, my ollama version is 0.4.1

Dec 04 '24 18:12 forensicmike

Hi @forensicmike , could you try with https://github.com/ollama/ollama/releases/tag/v0.4.6 or newer please? This is when tool calls in streaming responses were implemented in ollama (magentic currently uses streaming responses for all queries). I should document this somewhere

Dec 04 '24 18:12 jackmpcollins

My apologies. I installed ollama extremely recently so didn't think this would be it. Always forget how fast this stuff is changing :) Will update this reply once I try it.

Update: Confirming this resolved the issue, thanks very much! Have a ton of agentic things I want to try but am not willing to pay per token for.

Dec 04 '24 18:12 forensicmike

magentic magentic copied to clipboard

Ollama tool calls / structured output via LiteLLM are unreliable

magentic
magentic copied to clipboard