Pydantic conversion logic for structured outputs is broken for models containing dictionaries
Confirm this is an issue with the Python library and not an underlying OpenAI API
- [X] This is an issue with the Python library
Describe the bug
There's a bug in OpenAI's python client logic for translating pydantic models with dictionaries into structured outputs JSON schema definitions: dictionaries are always required to be empty in the resulting JSON schema, rendering the dictionary outputs significantly less useful since the LLM is never allowed to populate them
I've filed a small PR to fix this and introduce test coverage: https://github.com/openai/openai-python/pull/2003
To Reproduce
import json
from typing import Any, Dict
import pydantic
from openai.lib._pydantic import to_strict_json_schema
class GenerateToolCallArguments(pydantic.BaseModel):
arguments: Dict[str, Any] = pydantic.Field(description="The arguments to pass to the tool")
print(json.dumps(to_strict_json_schema(GenerateToolCallArguments), indent=4))
Observe that the output inserts additionalProperties: False into the resulting JSON schema definition, meaning that the dictionary must always be empty:
{
"properties": {
"arguments": {
"description": "The arguments to pass to the tool",
"title": "Arguments",
"type": "object",
# THE INSERTION OF THIS LINE IS A BUG
"additionalProperties": false
}
},
"required": [
"arguments"
],
"title": "GenerateToolCallArguments",
"type": "object",
"additionalProperties": false
}
Code snippets
No response
OS
macOS
Python version
Python v3.10.12
Library version
1.59.6
Tagging @RobertCraigie for visibility, just in case (saw that you've been active on recent issues) :)
I'm having the same issue, can confirm that models with dictionaries is the root problem. But i checked the documentation again, and they do talk about only allowing additionalProperties=false.
@RobertCraigie Any updates or additional thoughts here?
I have also encountered the same issue. After some tinkering, I found some more types that resulted in errors. The only buildin collection type that doesn't seem to be affected is the list.
My code (python 3.13.1):
import json
from pydantic import BaseModel
from openai.lib._pydantic import to_strict_json_schema
from openai import OpenAI
class Schema(BaseModel):
# Python buildin collections
# `range` and `bytearray` are not supported types, so I didn't include them
# tuple_field: tuple[int, int, int]
list_field: list[int]
# dict_field: dict[int, int]
# set_field: set[int]
# frozenset_field: frozenset[int]
# bytes_field: bytes
print(json.dumps(to_strict_json_schema(Schema), indent=4))
api_key = ...
with OpenAI(api_key=api_key) as client:
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[{"role": "user", "content": "Fill the schema with random values"}],
response_format=Schema,
)
print(json.dumps(response.choices[0].message.model_dump()["content"], indent=4))
@RobertCraigie Any updates or additional thoughts here?
@dbczumar @RobertCraigie
The behaviour of this bug seems to have changed now. Although previously, it used to return additionalProperties: false for such dictionary fields, but with pydantic v2.11+, it now returns additionalProperties: true. This causes the following error now:
'additionalProperties' is required to be supplied and to be false.
From pydantic v2.11, such fields like Dict[str, Any] are represented in json schema like this:
{ "additionalProperties": true, "type": "object" }
whereas in older versions of pydantic, i.e. till v2.10.6, such a field was represented like this:
{ "type": "object" }
To Reproduce
from typing import Any
from openai import OpenAI
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
addressComponents: dict[str, Any]
client = OpenAI()
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "user", "content": "Generate a dummy person's info"}
],
response_format=Person,
)
print(completion.choices[0].message.parsed)
OS
macOS
Python version
3.13.3
OpenAI Library Version
1.78.1
Pydantic Version
2.11.4
I identified that the current to_strict_json_schema logic incorrectly forces Dict fields (e.g., Dict[str, Any]) to always have "additionalProperties": false, which makes dictionary outputs unusable since the LLM can never populate them.
My fix updates the schema conversion so that Dict[str, Any] fields are correctly mapped to "additionalProperties": true (or an unrestricted schema). This aligns with Pydantic’s intended behavior and ensures structured outputs can contain arbitrary key-value pairs. import json from typing import Any, Dict import pydantic from openai.lib._pydantic import to_strict_json_schema
class GenerateToolCallArguments(pydantic.BaseModel): arguments: Dict[str, Any] = pydantic.Field(description="The arguments to pass to the tool")
print(json.dumps(to_strict_json_schema(GenerateToolCallArguments), indent=4)) The above one is example after the fix and fixed below output be like { "properties": { "arguments": { "description": "The arguments to pass to the tool", "title": "Arguments", "type": "object", "additionalProperties": true } }, "required": ["arguments"], "title": "GenerateToolCallArguments", "type": "object", "additionalProperties": false } With this change, the model now correctly allows arbitrary dictionaries in structured outputs, making the JSON schema more useful and consistent with Pydantic’s semantics.