pydantic-ai icon indicating copy to clipboard operation
pydantic-ai copied to clipboard

finish_reason for RunResult

Open milest opened this issue 11 months ago • 5 comments

Is there a way to view the finish_reason for a RunResult? (eg: end_turn, max_tokens, refusal, etc.)

I'm using vertex and I thought it might be available in RunReason.usage().details, but that's actually None for me.

The reason I ask is that when I have a completion that is longer than max_tokens, I need to know that so that I can give the llm a chance to finish the completion.

I'm using pydantic-ai-slim[vertexai]==0.0.14 and VertexAIModel('gemini-1.5-flash')

milest avatar Dec 20 '24 05:12 milest

I think that's a reasonable request.

samuelcolvin avatar Dec 20 '24 07:12 samuelcolvin

Indeed this is quite useful as we want to be able adjust max_tokens so as to better manage our usage for various providers.

Right now when I run using a BaseModel as the result_type, I'm getting these results with max_tokens=5 (failing on purpose):

OpenAI: gpt-4o-2024-11-20

pydantic_ai.exceptions.ModelHTTPError: status_code: 400, model_name: gpt-4o-2024-11-20, body: {'message': 'Could not finish the message because max_tokens was reached. Please try again with higher max_tokens.', 'type': 'invalid_request_error', 'param': None, 'code': None}

Bedrock: anthropic.claude-3-5-sonnet-20240620-v1:0

pydantic_core._pydantic_core.ValidationError: 1 validation error for DynamicModel
  Invalid JSON: EOF while parsing an object at line 1 column 1 [type=json_invalid, input_value='{', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/json_invalid

Gemini: gemini-2.0-flash

pydantic_core._pydantic_core.ValidationError: 1 validation error for DynamicModel
  Invalid JSON: EOF while parsing a value at line 1 column 0 [type=json_invalid, input_value='', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/json_invalid

So ideally the finish_reason would always be present in the exception so we can determine the root cause of the issue.

ianardee avatar Apr 04 '25 11:04 ianardee

I was implementing this when I noticed that the OpenAI Responses API doesn't have this field. Is the field not necessary? 🤔

Kludex avatar Apr 18 '25 11:04 Kludex

I think it's in ResponseOutputMessage.status='incomplete'

The details appear to be in incomplete_details

main.py

import os
from pprint import pp
from openai import OpenAI

client = OpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)
response = client.responses.create(
    model="gpt-4o",
    instructions="You are a coding assistant that talks like a pirate.",
    input="How do I check if a Python object is an instance of a class?",
    max_output_tokens=16,
)
pp(dict(response))

uv run --with openai main.py

{'id': 'resp_68025457034c81919224f30793a71b1403aa0aff7e3119fc',
 'created_at': 1744983127.0,
 'error': None,
 'incomplete_details': IncompleteDetails(reason='max_output_tokens'),
 'instructions': 'You are a coding assistant that talks like a pirate.',
 'metadata': {},
 'model': 'gpt-4o-2024-08-06',
 'object': 'response',
 'output': [ResponseOutputMessage(id='msg_6802545783108191a725b6c4e6a95c6503aa0aff7e3119fc', content=[ResponseOutputText(annotations=[], text='Arrr matey! To check if a Python object be an instance of a', type='output_text')], role='assistant', status='incomplete', type='message')],
 'parallel_tool_calls': True,
 'temperature': 1.0,
 'tool_choice': 'auto',
 'tools': [],
 'top_p': 1.0,
 'max_output_tokens': 16,
 'previous_response_id': None,
 'reasoning': Reasoning(effort=None, generate_summary=None, summary=None),
 'service_tier': 'default',
 'status': 'incomplete',
 'text': ResponseTextConfig(format=ResponseFormatText(type='text')),
 'truncation': 'disabled',
 'usage': ResponseUsage(input_tokens=37, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=16, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=53),
 'user': None,
 'store': True}

milest avatar Apr 18 '25 13:04 milest

We just spent a lot of time solving the problem of rising retry counts for llm tool calls, and it was caused by small max_token configuration by provider(used default value).

Being able to see finish_reason=length could help us monitor the similar issues.

Wh1isper avatar May 27 '25 02:05 Wh1isper

Isn't this closed by:

  • https://github.com/pydantic/pydantic-ai/issues/886 (Add id and finish_reason to ModelResponse #886)
  • https://github.com/pydantic/pydantic-ai/pull/2590 (Add ModelResponse.finish_reason and set provider_response_id while streaming #2590)

?

dsfaccini avatar Oct 20 '25 19:10 dsfaccini