openai-python icon indicating copy to clipboard operation
openai-python copied to clipboard

Pydantic conversion logic for structured outputs is broken for models containing dictionaries

Open dbczumar opened this issue 11 months ago • 5 comments

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • [X] This is an issue with the Python library

Describe the bug

There's a bug in OpenAI's python client logic for translating pydantic models with dictionaries into structured outputs JSON schema definitions: dictionaries are always required to be empty in the resulting JSON schema, rendering the dictionary outputs significantly less useful since the LLM is never allowed to populate them

I've filed a small PR to fix this and introduce test coverage: https://github.com/openai/openai-python/pull/2003

To Reproduce

import json
from typing import Any, Dict

import pydantic

from openai.lib._pydantic import to_strict_json_schema

class GenerateToolCallArguments(pydantic.BaseModel):
    arguments: Dict[str, Any] = pydantic.Field(description="The arguments to pass to the tool")

print(json.dumps(to_strict_json_schema(GenerateToolCallArguments), indent=4))

Observe that the output inserts additionalProperties: False into the resulting JSON schema definition, meaning that the dictionary must always be empty:

{
    "properties": {
        "arguments": {
            "description": "The arguments to pass to the tool",
            "title": "Arguments",
            "type": "object",
            # THE INSERTION OF THIS LINE IS A BUG
            "additionalProperties": false
        }
    },
    "required": [
        "arguments"
    ],
    "title": "GenerateToolCallArguments",
    "type": "object",
    "additionalProperties": false
}

Code snippets

No response

OS

macOS

Python version

Python v3.10.12

Library version

1.59.6

dbczumar avatar Jan 10 '25 01:01 dbczumar

Tagging @RobertCraigie for visibility, just in case (saw that you've been active on recent issues) :)

dbczumar avatar Jan 10 '25 01:01 dbczumar

I'm having the same issue, can confirm that models with dictionaries is the root problem. But i checked the documentation again, and they do talk about only allowing additionalProperties=false.

BrunoScaglione avatar Jan 11 '25 23:01 BrunoScaglione

@RobertCraigie Any updates or additional thoughts here?

dbczumar avatar Jan 15 '25 22:01 dbczumar

I have also encountered the same issue. After some tinkering, I found some more types that resulted in errors. The only buildin collection type that doesn't seem to be affected is the list.

My code (python 3.13.1):

import json
from pydantic import BaseModel
from openai.lib._pydantic import to_strict_json_schema
from openai import OpenAI


class Schema(BaseModel):
    # Python buildin collections
    # `range` and `bytearray` are not supported types, so I didn't include them

    # tuple_field: tuple[int, int, int]
    list_field: list[int]
    # dict_field: dict[int, int]
    # set_field: set[int]
    # frozenset_field: frozenset[int]
    # bytes_field: bytes


print(json.dumps(to_strict_json_schema(Schema), indent=4))

api_key = ...
with OpenAI(api_key=api_key) as client:
    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Fill the schema with random values"}],
        response_format=Schema,
    )

print(json.dumps(response.choices[0].message.model_dump()["content"], indent=4))

dvschuyl avatar Jan 24 '25 12:01 dvschuyl

@RobertCraigie Any updates or additional thoughts here?

dbczumar avatar Feb 19 '25 08:02 dbczumar

@dbczumar @RobertCraigie The behaviour of this bug seems to have changed now. Although previously, it used to return additionalProperties: false for such dictionary fields, but with pydantic v2.11+, it now returns additionalProperties: true. This causes the following error now: 'additionalProperties' is required to be supplied and to be false.

From pydantic v2.11, such fields like Dict[str, Any] are represented in json schema like this: { "additionalProperties": true, "type": "object" }

whereas in older versions of pydantic, i.e. till v2.10.6, such a field was represented like this: { "type": "object" }

To Reproduce

from typing import Any
from openai import OpenAI
from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int
    addressComponents: dict[str, Any]

client = OpenAI()
completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "user", "content": "Generate a dummy person's info"}
    ],
    response_format=Person,
)
print(completion.choices[0].message.parsed)

OS

macOS

Python version

3.13.3

OpenAI Library Version

1.78.1

Pydantic Version

2.11.4

choudhary-akash avatar May 14 '25 20:05 choudhary-akash

I identified that the current to_strict_json_schema logic incorrectly forces Dict fields (e.g., Dict[str, Any]) to always have "additionalProperties": false, which makes dictionary outputs unusable since the LLM can never populate them.

My fix updates the schema conversion so that Dict[str, Any] fields are correctly mapped to "additionalProperties": true (or an unrestricted schema). This aligns with Pydantic’s intended behavior and ensures structured outputs can contain arbitrary key-value pairs. import json from typing import Any, Dict import pydantic from openai.lib._pydantic import to_strict_json_schema

class GenerateToolCallArguments(pydantic.BaseModel): arguments: Dict[str, Any] = pydantic.Field(description="The arguments to pass to the tool")

print(json.dumps(to_strict_json_schema(GenerateToolCallArguments), indent=4)) The above one is example after the fix and fixed below output be like { "properties": { "arguments": { "description": "The arguments to pass to the tool", "title": "Arguments", "type": "object", "additionalProperties": true } }, "required": ["arguments"], "title": "GenerateToolCallArguments", "type": "object", "additionalProperties": false } With this change, the model now correctly allows arbitrary dictionaries in structured outputs, making the JSON schema more useful and consistent with Pydantic’s semantics.

LuminaX-alt avatar Aug 18 '25 15:08 LuminaX-alt