datamodel-code-generator icon indicating copy to clipboard operation
datamodel-code-generator copied to clipboard

oneOf with subschema array items not incorporated/generated as Any for pydantic.v2

Open kraftto opened this issue 1 year ago • 4 comments

Describe the bug When using a JSON schema as input with aoneOf construct where one option is an array with items defined in a subschema, the resulting pydantic v2 model does not incorporate the subschema definition, but rather list[Any]

To Reproduce

The following JSON schema snippet:

    "SpatialPlan": {
      "type": "object",
      "properties": {
        "officialDocument": {
          "title": "officialDocument",
          "description": "Link to the official documents that relate to the spatial plan.",
          "oneOf": [
            {
              "$ref": "definitions/voidable.json#/definitions/Voidable"
            },
            {
              "type": "array",
              "minItems": 1,
              "items": {
                "$ref": "definitions/ref.json#/definitions/FeatureRef"
              },
              "uniqueItems": true
            }
          ]
        }

leads to the pydantic v2 model:

class OfficialDocument(RootModel[list[Any]]):
    root: Annotated[
        list[Any],
        Field(
            description='Link to the official documents that relate to the spatial plan.',
            min_length=1,
            title='officialDocument',
        ),
    ]

class SpatialPlan(BaseModel):
    officialDocument: Annotated[
        Voidable | OfficialDocument,
        Field(
            description='Link to the official documents that relate to the spatial plan.',
            title='officialDocument',
        ),
    ]

Used commandline:

$ datamodel-codegen --target-python-version 3.10 --use-union-operator --use-standard-collections --use-schema-description --use-annotated --collapse-root-models --output-model-type pydantic_v2.BaseModel --input input.json --output output.py

Expected behavior The resulting pydantic model should look like this:

class OfficialDocument(RootModel[list[FeatureRef]]):
    root: Annotated[
        list[FeatureRef],
        Field(
            description="Link to the official documents that relate to the spatial plan.",
            min_length=1,
            title="officialDocument",
        ),
    ]

Or maybe even more preferable, the addtional RootModel definition should be dropped as a whole:

class SpatialPlan(BaseModel):
    officialDocument: Annotated[
        list[FeatureRef] | Voidable,
        Field(
            description="Link to the official documents that relate to the spatial plan.",
            min_length=1,
            title="officialDocument",
        ),
    ]

Version:

  • OS: Ubuntu 22.04 (WSL)
  • Python version: 3.10
  • datamodel-code-generator version: 0.25.5

Additional context Add any other context about the problem here.

kraftto avatar Apr 10 '24 08:04 kraftto

Is this related to: https://github.com/koxudaxi/datamodel-code-generator/blob/fcab9a4d555d4b96d64bb277f974bb7507982fb2/datamodel_code_generator/parser/jsonschema.py#L681-L694

If so - or if you can provide another hint - maybe we can have a look and work on a PR. This issue is really hampering our use case.

kraftto avatar May 13 '24 07:05 kraftto

I've been looking into a similar issue on my project - so far I think it may be related to enabling the --field-constraints option, which is also enabled by using the --use-annotated option.

I'm working off of a very slightly modified version of the CycloneDX 1.5 schema, where the licenses field here is changed from an array to object type (due to some other issue with datamodel-code-generator parsing the schema). I expect to get a Python class somewhere that includes the expression and bom-ref fields. Here's what I'm seeing using datamodel-codegen 0.25.6, with the command

datamodel-codegen --input ~/temp/modified-bom-1.5.schema.json --output output-license-obj-annotated --use-annot ated:

class LicenseChoice1(BaseModel):
    __root__: Annotated[
        List[Any],
        Field(
            description='A tuple of exactly one SPDX License Expression.',
            max_items=1,
            min_items=1,
            title='SPDX License Expression',
        ),
    ]

class LicenseChoice(BaseModel):
    __root__: Annotated[
        Union[List[LicenseChoiceItem], LicenseChoice1],
        Field(
            description='EITHER (list of SPDX licenses and/or named licenses) OR (tuple of one SPDX License Expression)',
            title='License Choice',
        ),
    ]

When I remove --use-annotated, I get something more like what I expect:

class LicenseChoiceItem1(BaseModel):
    class Config:
        extra = Extra.forbid

    expression: str = Field(
        ...,
        examples=[
            'Apache-2.0 AND (MIT OR GPL-2.0-only)',
            'GPL-3.0-only WITH Classpath-exception-2.0',
        ],
        title='SPDX License Expression',
    )
    bom_ref: Optional[RefType] = Field(
        None,
        alias='bom-ref',
        description='An optional identifier which can be used to reference the license elsewhere in the BOM. Every bom-ref MUST be unique within the BOM.',
        title='BOM Reference',
    )

class LicenseChoice(BaseModel):
    __root__: Union[List[LicenseChoiceItem], List[LicenseChoiceItem1]] = Field(
        ...,
        description='EITHER (list of SPDX licenses and/or named licenses) OR (tuple of one SPDX License Expression)',
        title='License Choice',
    )

I'll keep digging, but for now it appears that using annotations/field constraints ends up dropping type information somewhere down that path.

jdweav avatar May 14 '24 17:05 jdweav

I can confirm that dropping --field-constraints could be considered a workaround - thanks for the hint! However, this limits the possibilities in the model, e.g. pattern constraints cannot be used anymore.

kraftto avatar May 16 '24 07:05 kraftto

I see you already provided a PR - great :)

kraftto avatar May 16 '24 07:05 kraftto