Some models ran through OpenAI-compatible APIs need JSON schema in instructions even when native structured output is used
Description
Currently, reponse_format is set to json_object for PromptedOutput if json object is supported.
It's better to set it to json schema if json schema is supported just like native output, before checking the json_object.
The problem with NativeOutput is that, for many endpoints, enabling structured output alone without guiding in the prompt causes bad and unstable response. After all the model doesn't know the schema, it's just logits mask. Openai is a notable exception though.
I can contribute to this if the proposal is OK for the maintainers.
References
No response
hey @xcpky could you provide a MRE for clarity? that'll help confirm what the issue is and perhaps narrow it down to a provider
@xcpky I feel that PromptedOutput working like NativeOutput when that feature is available would be unexpected and go against the user's intent when they explicitly pick that output mode. They may have done benchmarking on performance or something, or realized the model has a restriction that causes native output not to work (e.g. older Gemini models not supporting native output with built-in tools), meaning the use of the more "naive" prompted mode could've been intentional.
The problem with NativeOutput is that, for many endpoints, enabling structured output alone without guiding in the prompt causes bad and unstable response. After the model doesn't know the schema, it's just logits mask. Openai is a notable exception though.
Do you have an example of that? When NativeOutput is used, we always send the JSON schema in params like OpenAI's response_format, so we assume the API passes that on to the model. But if it's only applying logit masking like you say, we should also be including the schema in instructions in that case. I'd rather fix that for the particular models that don't do that automatically, than change the behavior of prompted output.
Do you have an example of that? When
NativeOutputis used, we always send the JSON schema in params like OpenAI'sresponse_format, so we assume the API passes that on to the model. But if it's only applying logic masking like you say, we should also be including the schema in instructions in that case. I'd rather fix that for the particular models that don't do that automatically, than change the behavior of prompted output.
I think neither vllm, sglang or llama-server pass the schema to prompt.
For example, here are some kimi-k2-thinking response I get in openrouter using NativeOutput:
This one outputs empty steps.
[
{
"parts": [
{
"content": "how can I solve 8x + 7 = -23",
"timestamp": "2025-11-19T13:16:38.348047Z",
"part_kind": "user-prompt"
}
],
"instructions": "You are a helpful math tutor. Guide the user through the solution step by step.",
"kind": "request"
},
{
"parts": [
{
"content": " The user wants to solve the equation 8x + 7 = -23. This is a linear equation in one variable, x. I need to guide them through the steps to isolate x.\n\nHere's the plan:\n1. Explain the goal: isolate x.\n2. Step 1: Subtract 7 from both sides to get rid of the constant term on the left side. This uses the addition/subtraction property of equality.\n3. Step 2: Divide both sides by 8 to get rid of the coefficient of x. This uses the multiplication/division property of equality.\n4. Step 3: State the final answer and suggest checking the solution.\n\nLet me write this out clearly and simply.\n\n**Step 1: Isolate the variable term.**\nThe equation is 8x + 7 = -23.\nI need to get rid of the +7 on the left side. I can do this by subtracting 7 from both sides.\n8x + 7 - 7 = -23 - 7\n8x = -30\n\n**Step 2: Isolate the variable.**\nNow I have 8x = -30.\nTo get x by itself, I need to divide both sides by 8.\n8x / 8 = -30 / 8\nx = -30/8\n\n**Step 3: Simplify.**\nThe fraction -30/8 can be simplified. Both 30 and 8 are divisible by 2.\nx = -15/4\n\n**Step 4: Check the solution (optional but good practice).**\nSubstitute x = -15/4 back into the original equation:\n8x + 7 = -23\n8(-15/4) + 7 = -23\n2 * (-15) + 7 = -23\n-30 + 7 = -23\n-23 = -23\nThe left side equals the right side, so the solution is correct.\n\nI should present this in a clear, step-by-step manner as requested. I'll use simple language and explain the 'why' behind each step.",
"id": "reasoning",
"signature": null,
"provider_name": "openrouter",
"part_kind": "thinking"
},
{
"content": " {\"steps\":[],\"final_answer\":\"x = -15/4\"}",
"id": null,
"part_kind": "text"
}
],
"usage": {
"input_tokens": 41,
"cache_write_tokens": 0,
"cache_read_tokens": 0,
"output_tokens": 448,
"input_audio_tokens": 0,
"cache_audio_read_tokens": 0,
"output_audio_tokens": 0,
"details": {}
},
"model_name": "moonshotai/kimi-k2-thinking",
"timestamp": "2025-11-19T13:16:40Z",
"kind": "response",
"provider_name": "openrouter",
"provider_details": {
"finish_reason": "stop"
},
"provider_response_id": "",
"finish_reason": "stop"
}
]
This one output a lot of spaces and new lines.
[
{
"parts": [
{
"content": "how can I solve 8x + 7 = -23",
"timestamp": "2025-11-19T13:18:15.909618Z",
"part_kind": "user-prompt"
}
],
"instructions": "You are a helpful math tutor. Guide the user through the solution step by step.",
"kind": "request"
},
{
"parts": [
{
"content": "The user is asking how to solve the equation 8x + 7 = -23. This is a linear equation, and I need to guide them through solving for x step by step.\n\nLet me think about the best way to teach this:\n\n1. First, I should explain the goal: isolate x on one side of the equation.\n2. Then show the steps clearly:\n - Subtract 7 from both sides to isolate the term with x\n - Divide both sides by 8 to solve for x\n3. I should explain why each step works (using inverse operations)\n4. Show the arithmetic clearly\n5. Verify the answer by plugging it back in\n6. Maybe provide a general strategy for similar problems\n\nLet me structure this as a friendly, step-by-step guide:\n\nStep 1: Isolate the variable term\n- Subtract 7 from both sides: 8x + 7 - 7 = -23 - 7\n- Simplify: 8x = -30\n\nStep 2: Solve for x\n- Divide both sides by 8: 8x/8 = -30/8\n- Simplify: x = -30/8 = -15/4 or -3.75\n\nWait, let me double-check that subtraction: -23 - 7 = -30. Yes, that's correct.\n\nThen -30/8 simplifies to -15/4. As a decimal it's -3.75. I should probably show both forms.\n\nLet me also verify: 8(-15/4) + 7 = 8/1 \u00d7 -15/4 + 7 = 2 \u00d7 -15 + 7 = -30 + 7 = -23. Perfect.\n\nI should present this in a clear, encouraging way that builds understanding, not just gives the answer.",
"id": "reasoning",
"signature": null,
"provider_name": "openrouter",
"part_kind": "thinking"
},
{
"content": "\n \t \t\n\t \n\t \n\t \n\t \n\n\n\n\t \n\n\t \n\n\t\n\n\n\t\n\n\n\t\n\n\n\t\n\n \n \n\n \t\n\n\n \n\n\n\n \n\n \n\n \n\n \n \n\n \n \n \n\n \n \n\n \t\n\n\n \n\n\n\n \n\n \n \n \n\n \n\n \n\n \n{\n \"steps\": [\n {\n \"explanation\": \"The goal is to isolate x on one side of the equation. Start by moving the constant term (+7) to the right side using the opposite operation (subtraction).\",\n \"output\": \"8x + 7 = -23\\n\\nSubtract 7 from both sides:\\n8x + 7 - 7 = -23 - 7\\n\\nSimplify:\\n8x = -30\"\n },\n {\n \"explanation\": \"Now x is multiplied by 8. To isolate x, divide both sides by 8 (the opposite of multiplication).\",\n \"output\": \"Divide both sides by 8:\\n8x / 8 = -30 / 8\\n\\nSimplify:\\nx = -30/8 = -15/4 (or -3.75)\"\n },\n {\n \"explanation\": \"Let's verify the solution by substituting x = -15/4 back into the original equation.\",\n \"output\": \"8(-15/4) + 7 = -30 + 7 = -23 \u2713\\n\\nThe left side equals the right side, so the solution is correct!\"\n }\n ],\n \"final_answer\": \"x = -15/4 (or -3.75)\",\n \"strategy\": \"Use inverse operations: subtract to move constants, divide to move coefficients. Always perform the same operation on both sides.\"\n}",
"id": null,
"part_kind": "text"
}
],
"usage": {
"input_tokens": 41,
"cache_write_tokens": 0,
"cache_read_tokens": 0,
"output_tokens": 742,
"input_audio_tokens": 0,
"cache_audio_read_tokens": 0,
"output_audio_tokens": 0,
"details": {}
},
"model_name": "moonshotai/kimi-k2-thinking",
"timestamp": "2025-11-19T13:18:17Z",
"kind": "response",
"provider_name": "openrouter",
"provider_details": {
"finish_reason": "stop"
},
"provider_response_id": "gen-1763558297-sW0Ug8wD11kkLZySzRZl",
"finish_reason": "stop"
}
]
Those two never happened using PromptedOutput or NativeOutput with prompted instruction for me.
@xcpky Thanks, that's helpful.
The vLLM docs explicitly state that the JSON schema should be in the prompt:
While not strictly necessary, normally it's better to indicate in the prompt the JSON schema and how the fields should be populated. This can improve the results notably in most cases.
The OpenRouter docs only say to add "descriptions to your schema properties", basically confirming that the full schema is already provided to the model. So that makes your results with kimi-k2-thinking surprising...
(None of OpenAI, Anthropic or Google say to manually include the schema when using their structured output features fwiw, so this seems like it mostly affects open source models ran locally or through inference providers)
Either way, it sounds like we need to have an option on ModelProfile (see third bullet at https://ai.pydantic.dev/models/overview/#models-and-providers) to send the PromptedOutputSchema.instructions even in native output mode, so that you can enable that mode manually, and we'd set it automatically when a future vLLMProvier is used.
Note that OutlinesModel already uses a trick to unconditionally get that behavior:
https://github.com/pydantic/pydantic-ai/blob/d578b42e778d271335256408f9f9e1650232106b/pydantic_ai_slim/pydantic_ai/models/outlines.py#L529-L535
Would you be up for contributing this new feature?
@xcpky Thanks, that's helpful.
The vLLM docs explicitly state that the JSON schema should be in the prompt:
While not strictly necessary, normally it's better to indicate in the prompt the JSON schema and how the fields should be populated. This can improve the results notably in most cases.
Yes, and sglang also explicity state that:
For better output quality, It’s advisable to explicitly include instructions in the prompt to guide the model to generate the desired format. For example, you can specify, ‘Please generate the output in the following JSON format: …’.
Those two are the most widely used inference server. And I believe many providers are using them.
The OpenRouter docs only say to add "descriptions to your schema properties", basically confirming that the full schema is already provided to the model. So that makes your results with kimi-k2-thinking surprising...
I think OpenRouter only route request to the real provider. It's impossible for openrouter to change the prompt or instructions before routing to the provider.
Either way, it sounds like we need to have an option on
ModelProfile(see third bullet at https://ai.pydantic.dev/models/overview/#models-and-providers) to send thePromptedOutputSchema.instructionseven innativeoutput mode, so that you can enable that mode manually, and we'd set it automatically when a futurevLLMProvieris used.
Sounds good.
Would you be up for contributing this new feature?
I will git it a try.
@xcpky Thank you. Let me know if you have any questions. I suggest adding a field to OpenAIModelProfile, and then using it inside OpenAIChatModel or OpenAIResponsesModel (in case you want to target their responses API instead of chat).
We can then add a new vLLMProvider, based on the existing ones for other OpenAI-compatible providers listed at https://ai.pydantic.dev/models/openai/#openai-compatible-models.
I've created a new issue for the new provider: https://github.com/pydantic/pydantic-ai/issues/3515
I feel that PromptedOutput working like NativeOutput when that feature is available would be unexpected and go against the user's intent when they explicitly pick that output mode. They may have done benchmarking on performance or something, or realized the model has a restriction that causes native output not to work (e.g. older Gemini models not supporting native output with built-in tools), meaning the use of the more "naive" prompted mode could've been intentional.
I still think it's better to use json schema in prompted output.
- prompted means "prompted", no confusion here.
- If user has benchmarked and find json schema doesn't work, user can just set supports_json_schema_output to False. I believe this is rare nowadays.
And normally, json object use the same mechanism as json schema. If something doesn't work with json schema, it's unlikely to work with json object either. A common example is tool calling not compatible with json object/schema. Deepseek official api doesn't support function tool call with json object mode.
So I don't think it necessary to treat json object and json schema separately.
[EDIT]: My current idea is to add an option that disables both JSON Schema and JSON object support in prompted mode. And detect both of them when that option is not set.
And normally, json object use the same mechanism as json schema. If something doesn't work with json schema, it's unlikely to work with json object either.
@xcpky I don't think that's true: JSON object mode is faster (JSON schema doesn't need to be compiled) and less error prone (not all JSON schemas are supported).
So I don't think it necessary to treat json object and json schema separately.
Maybe not in theory, but it is in practice because of backward compatibility.
If a user read the docs and specifically chose to use PromptedOutput for one of those reasons (performance or unsupported schema), we can't change the thing that made it "special" against what the docs used to say, force strict JSON schema structured output on them, and potentially make their agent slower or start raising errors. Per https://ai.pydantic.dev/version-policy, we can only make a change like that until v2.
The thing is that right now, Tool, Native, and Prompted output are very different "modes", and a user would only pick one in particular if they read the differences and chose the tradeoffs they wanted. In your proposal, PromptedOutput would basically become "NativeOutput + include the schema in the prompt", which is no longer a different mode, but a behavioral flag on NativeOutput. Which is why I'm suggesting adding exactly that as a model-profile-specific flag, instead of changing the meaning of PromptedOutput.