flytekit icon indicating copy to clipboard operation
flytekit copied to clipboard

Fix JSON schema $ref resolution in nested Pydantic models

Open Copilot opened this issue 3 weeks ago • 2 comments

Tracking issue

Why are the changes needed?

Pydantic v2 generates JSON schemas with $ref references for nested models (e.g., {"$ref": "#/$defs/SingleObj"}). The schema parsing logic in type_engine.py was attempting to access property_val["type"] before resolving these references, causing KeyError: 'type'.

class SingleObj(BaseModel):
    a: str

class TestDatum(BaseModel):
    b: SingleObj                      # Direct ref: {"$ref": "#/$defs/SingleObj"}
    d: list[SingleObj]                # Array items ref: {"items": {"$ref": ...}}
    e: Optional[list[SingleObj]]      # anyOf with ref: {"anyOf": [{"items": {"$ref": ...}}]}

What changes were proposed in this pull request?

Added $ref resolution logic:

  • _resolve_json_schema_ref() dereferences schema paths like #/$defs/ModelName with proper error handling
  • Resolves references before type access, preventing KeyError

Updated schema processing functions:

  • _handle_json_schema_property() now accepts full schema and resolves $ref before processing
  • _get_element_type() handles resolved object types by converting them to dataclasses
  • Fixed type annotation: Dict[str, str]Dict[str, Any] for schema properties

Propagated schema context:

  • generate_attribute_list_from_dataclass_json_mixin() passes schema to helper functions
  • All recursive calls maintain schema context for nested reference resolution

How was this patch tested?

Added test_nested_pydantic_model_with_list covering:

  • Direct nested models with $ref
  • Lists of nested models with $ref in items
  • Optional lists with anyOf containing $ref

All existing pydantic transformer tests (30/30) and dataclass tests (38/38) pass.

Setup process

N/A

Screenshots

N/A

Check all the applicable boxes

  • [ ] I updated the documentation accordingly.
  • [x] All new and existing tests passed.
  • [ ] All commits are signed-off.

Related PRs

Docs link

Original prompt

Fix handling of JSON schema $ref references in nested Pydantic models

Problem

When Pydantic v2 generates JSON schemas for nested models (especially in lists like list[NestedModel]), it uses $ref references to definitions. The _handle_json_schema_property function in type_engine.py fails with KeyError: 'type' because it tries to access the "type" key before resolving the $ref.

Example that fails:

from pydantic import BaseModel

class SingleObj(BaseModel):
    a: str

class TestDatum(BaseModel):
    a: str
    b: SingleObj
    c: list[str]
    d: list[SingleObj]  # This fails - list of nested objects

This generates a schema like:

{
  "properties": {
    "d": {
      "anyOf": [
        {
          "type": "array",
          "items": {
            "$ref": "#/$defs/SingleObj"
          }
        },
        {
          "type": "null"
        }
      ]
    }
  }
}

The error occurs because:

  1. _handle_json_schema_property processes the anyOf and recursively calls itself for each item
  2. For the array item, it encounters {"type": "array", "items": {"$ref": "#/$defs/SingleObj"}}
  3. When processing the items, it tries to access property_val["type"] on the $ref dict
  4. This fails because $ref dicts only have a "$ref" key, not a "type" key

Solution

The _handle_json_schema_property function needs to resolve $ref references before attempting to access any schema properties. This should be done:

  1. At the beginning of the function (before any property access)
  2. Pass the full schema as a parameter to enable reference resolution
  3. Handle the reference path format #/$defs/ModelName or #/definitions/ModelName

The existing generate_attribute_list_from_dataclass_json function already has logic to handle $ref for nested dataclasses, and we should apply similar logic to generate_attribute_list_from_dataclass_json_mixin.

Also need to handle $ref in array items and other nested structures.

This pull request was created as a result of the following prompt from Copilot chat.

Fix handling of JSON schema $ref references in nested Pydantic models

Problem

When Pydantic v2 generates JSON schemas for nested models (especially in lists like list[NestedModel]), it uses $ref references to definitions. The _handle_json_schema_property function in type_engine.py fails with KeyError: 'type' because it tries to access the "type" key before resolving the $ref.

Example that fails:

from pydantic import BaseModel

class SingleObj(BaseModel):
    a: str

class TestDatum(BaseModel):
    a: str
    b: SingleObj
    c: list[str]
    d: list[SingleObj]  # This fails - list of nested objects

This generates a schema like:

{
  "properties": {
    "d": {
      "anyOf": [
        {
          "type": "array",
          "items": {
            "$ref": "#/$defs/SingleObj"
          }
        },
        {
          "type": "null"
        }
      ]
    }
  }
}

The error occurs because:

  1. _handle_json_schema_property processes the anyOf and recursively calls itself for each item
  2. For the array item, it encounters {"type": "array", "items": {"$ref": "#/$defs/SingleObj"}}
  3. When processing the items, it tries to access property_val["type"] on the $ref dict
  4. This fails because $ref dicts only have a "$ref" key, not a "type" key

Solution

The _handle_json_schema_property function needs to resolve $ref references before attempting to access any schema properties. This should be done:

  1. At the beginning of the function (before any property access)
  2. Pass the full schema as a parameter to enable reference resolution
  3. Handle the reference path format #/$defs/ModelName or #/definitions/ModelName

The existing generate_attribute_list_from_dataclass_json function already has logic to handle $ref for nested dataclasses, and we should apply similar logic to generate_attribute_list_from_dataclass_json_mixin.

Also need to handle $ref in array items and other nested structures.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot avatar Dec 05 '25 14:12 Copilot

Bito Automatic Review Skipped - Draft PR

Bito didn't auto-review because this pull request is in draft status.
No action is needed if you didn't intend for the agent to review it. Otherwise, to manually trigger a review, type /review in a comment and save.
You can change draft PR review settings here, or contact your Bito workspace admin at [email protected].

flyte-bot avatar Dec 05 '25 14:12 flyte-bot

Bito Automatic Review Skipped - Draft PR

Bito didn't auto-review because this pull request is in draft status.
No action is needed if you didn't intend for the agent to review it. Otherwise, to manually trigger a review, type /review in a comment and save.
You can change draft PR review settings here, or contact your Bito workspace admin at [email protected].

flyte-bot avatar Dec 05 '25 15:12 flyte-bot