pydantic-ai Structured Output fails with text output + Behaviour inconsistency

Initial Checks

[x] I confirm that I'm using the latest version of Pydantic AI
[x] I confirm that I searched for my issue in https://github.com/pydantic/pydantic-ai/issues before opening this issue

Description

Situation:

I need LLM text output, to show it to the user, where the LLM reasons and explains.
I want a structured output.

This is NOT working in streaming mode, but it does in NON-streaming mode.

See below tests. The first fails. The second passes.

Example Code

from typing import Union

from pydantic import BaseModel
from pydantic_ai import Agent


class CallAgent(BaseModel):
    agent_name: str


agent = Agent(
    model="google_gla:gemini-2.5-flash-preview-04-17",
    output_type=Union[str, CallAgent],
    instructions="Say hello and then transfer the user to 'user_assistant' agent",
)


async def test_output_with_str(): # FAILS
    async with agent.run_stream(user_prompt="Hello") as result:
        async for msg, is_last in result.stream_structured():
            print(msg)
    assert result.get_output() == CallAgent(agent_name="user_assistant")


async def test_output_with_str_no_stream(): # PASSES
    result = await agent.run(user_prompt="Hello")
    assert result.output == CallAgent(agent_name="user_assistant")

Python, Pydantic AI & LLM client version

pydanticai 0.1.4
any LLM
Python 3.12

Apr 25 '25 10:04 IngLP

It seems the fix can be this simple in agent.py, for agent responses including a tool call. unfortunately, this doesn't work if the agent just produces a text message.

Apr 25 '25 10:04 IngLP

@IngLP Can you please change output_type=Union[str, CallAgent] to output_type=CallAgent and see if it works as expected? That'll still allow the model to talk to the user before doing the handoff tool call, but will not cause PydanticAI to treat a text response as sufficient to complete the agent run.

Apr 25 '25 17:04 DouweM

Tried it, it doesn't work. Removing str from output_type PREVENTS the LLM to output text.

Apr 26 '25 09:04 IngLP

Moreover, this also blocks you from using stream_text(), which I need to show reasoning progress to the user.

Apr 26 '25 09:04 IngLP

@IngLP Thanks for trying that, you're right that that would not be the desired result...

To help us debug this further, can you please port your code to the new iter based approach described in https://github.com/pydantic/pydantic-ai/issues/1007#issuecomment-2690662109 (see the link to the docs there)? As written there, the run_stream approach has some issues and is slated for deprecation. I don't expect iter to immediately solve your issue (although maybe!), but at least we'd be debugging and fixing this in the new approach rather than the old one.

Apr 28 '25 18:04 DouweM

@IngLP Also, have you considered making call_agent a tool the model can choose to use (with appropriate prompting pushing it do so), instead of forcing it through the output type? That way the model is free to chat before calling the tool, and PydanticAI won't get confused in determining whether the conversation is over.

Apr 28 '25 18:04 DouweM

@IngLP Also, have you considered making call_agent a tool the model can choose to use (with appropriate prompting pushing it do so), instead of forcing it through the output type? That way the model is free to chat before calling the tool, and PydanticAI won't get confused in determining whether the conversation is over.

@DouweM indeed, this is exactly the workaround I have set up now. But this is not elegant at all, since it is a NOT supported approach (see issue https://github.com/pydantic/pydantic-ai/issues/1189 ) and makes you abuse the deps.

Apr 28 '25 21:04 IngLP

@IngLP Did you try the new agent.iter approach from https://ai.pydantic.dev/agents/#iterating-over-an-agents-graph?

That works as expected with output_type=Union[str, CallAgent]:

import asyncio
from typing import Union

from pydantic import BaseModel
from pydantic_ai import Agent
from pydantic_ai.messages import (
    PartDeltaEvent,
    PartStartEvent,
    TextPart,
    TextPartDelta,
)

class CallAgent(BaseModel):
    agent_name: str


agent = Agent[None, CallAgent](
    model="google-gla:gemini-2.5-flash-preview-04-17",
    output_type=Union[str, CallAgent], # pyright: ignore
    instructions="Say hello and then transfer the user to 'user_assistant' agent",
)

async def test_with_iter():
    async with agent.iter(user_prompt="Hello") as run:
        async for node in run:
            if Agent.is_model_request_node(node):
                async with node.stream(run.ctx) as request_stream:
                    async for event in request_stream:
                        if isinstance(event, PartStartEvent) and isinstance(event.part, TextPart):
                            print(event.part.content, end="", flush=True)
                        elif isinstance(event, PartDeltaEvent) and isinstance(event.delta, TextPartDelta):
                            print(event.delta.content_delta, end="", flush=True)
        assert run.result.output == CallAgent(agent_name="user_assistant")

asyncio.run(test_with_iter())

Apr 28 '25 22:04 DouweM

This works for this simple case, but it doesn't do all the processing and handling performed by agent.run_stream()

Apr 29 '25 07:04 IngLP

@IngLP What specific behavior are you missing? iter is not as convenient as run_stream yet, but it is the direction we're moving into because of issues with run_sync like the one you ran into here.

Apr 29 '25 20:04 DouweM

I mean, I would have to re-implement all the logic from here: https://github.com/pydantic/pydantic-ai/blob/cbfb31177d367d553b4bf5cc92895d33db5be0d1/pydantic_ai_slim/pydantic_ai/agent.py#L945

Apr 30 '25 06:04 IngLP

@IngLP A good amount of that is already covered by the async with node.stream(run.ctx) as request_stream: statement inside async with agent.iter(user_prompt="Hello") as run:. What exactly are you expecting to have to reimplement? We're planning to add more convenience features around iter, so it'd be useful to know your specific concerns.

Apr 30 '25 22:04 DouweM

Ran into this today as well.

@DouweM e.g. the maintenance of messages here https://github.com/pydantic/pydantic-ai/blob/cbfb31177d367d553b4bf5cc92895d33db5be0d1/pydantic_ai_slim/pydantic_ai/agent.py#L1003 is still important functionality. Is it possible to provide a comprehensive example of the pattern with .iter that has all the [expected] functionality of a .run_stream? At least until a more formal approach is documented?...

If I just start ripping code from .run_stream I presume the underlying bug will come with it, as I don't actually know what the issue is that makes this unusable with tool calls.

May 09 '25 06:05 smomen

I'm also running into this issue.

May 10 '25 07:05 CharlieEriksen

@IngLP A good amount of that is already covered by the async with node.stream(run.ctx) as request_stream: statement inside async with agent.iter(user_prompt="Hello") as run:. What exactly are you expecting to have to reimplement? We're planning to add more convenience features around iter, so it'd be useful to know your specific concerns.

@DouweM I just meant all Tool calling mechanics and other internal processing currently performed by PydanticAI. Is it already handled? Or must I write code to handle it?

May 23 '25 09:05 IngLP

@DouweM btw, in your code here https://github.com/pydantic/pydantic-ai/issues/1590#issuecomment-2836948080 , what happens if the agent doesn't call CallAgent tool? What are the possibilities to handle that?

May 23 '25 09:05 IngLP

I just meant all Tool calling mechanics and other internal processing currently performed by PydanticAI. Is it already handled? Or must I write code to handle it?

@IngLP That's all still handled by PydanticAI!

what happens if the agent doesn't call CallAgent tool?

In that example, output_type is set to Union[str, CallAgent], so the model can choose to either respond with text or CallAgent. If you drop str as an option, the model will be forced to return CallAgent.

May 23 '25 13:05 DouweM