datamodel-code-generator icon indicating copy to clipboard operation
datamodel-code-generator copied to clipboard

`--collapse-root-models` leads to dropped annotation

Open lmmx opened this issue 8 months ago • 0 comments

Describe the bug

When parsing the FlatZinc-JSON schema (a JSONSchema spec, not an instance of it) and using --collapse-root-models I got an error from Black. This was because the code was not valid.

It had been invalidated by removal of the first element of the union, leaving Union[, BarModel] instead of Union[FooModel, BarModel].

To Reproduce

Example schema:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "$id": "https://www.minizinc.org/schemas/fznjson",
  "title": "FlatZincJSON",
  "description": "A JSON representation of a FlatZinc model",

  "$defs": {
    "identifier": { "type": "string", "pattern": "[A-Za-z][A-Za-z0-9_]*" },
    "literal": {
      "oneOf": [
        { "type": "number" },
        { "$ref": "#/$defs/identifier" },
        { "type": "boolean" },
        {
          "type": "object",
          "properties": {
            "set": {
              "type": "array",
              "items": {
                "type": "array",
                "items": [{ "type": "number" }, { "type": "number" }]
              }
            }
          },
          "required": ["set"]
        },
        {
          "type": "object",
          "properties": {
            "string": { "type": "string" }
          },
          "required": ["string"]
        }
      ]
    },
    "literals": { "type": "array", "items": { "$ref": "#/$defs/literal" } },
    "argument": {
      "oneOf": [{ "$ref": "#/$defs/literals" }, { "$ref": "#/$defs/literal" }]
    },
    "annotation": {
      "oneOf": [{ "$ref": "#/$defs/annotationCall" }, { "type": "string" }]
    },
    "annotations": {
      "type": "array",
      "items": { "$ref": "#/$defs/annotation" }
    },
    "annotationArgument": {
      "oneOf": [
        { "$ref": "#/$defs/annotationLiterals" },
        { "$ref": "#/$defs/annotationLiteral" }
      ]
    },
    "annotationCall": {
      "type": "object",
      "properties": {
        "id": { "$ref": "#/$defs/identifier" },
        "args": {
          "type": "array",
          "items": { "$ref": "#/$defs/annotationArgument" }
        }
      },
      "required": ["id", "args"]
    },
    "annotationLiterals": {
      "type": "array",
      "items": { "$ref": "#/$defs/annotationLiteral" }
    },
    "annotationLiteral": {
      "oneOf": [
        { "$ref": "#/$defs/literal" },
        { "$ref": "#/$defs/annotationCall" }
      ]
    },
    "domain": {
      "type": "array",
      "items": {
        "type": "array",
        "items": [{ "type": "number" }, { "type": "number" }]
      }
    }
  },

  "type": "object",
  "properties": {
    "version": { "type": "string" },
    "variables": {
      "type": "object",
      "patternProperties": {
        "[A-Za-z][A-Za-z0-9_]*": {
          "type": "object",
          "properties": {
            "type": {
              "enum": ["bool", "float", "int", "set of int"]
            },
            "domain": { "$ref": "#/$defs/domain" },
            "rhs": { "$ref": "#/$defs/literal" },
            "introduced": {
              "type": "boolean"
            },
            "defined": {
              "type": "boolean"
            },
            "ann": { "$ref": "#/$defs/annotations" }
          },
          "required": ["type"]
        }
      }
    },
    "arrays": {
      "type": "object",
      "patternProperties": {
        "[A-Za-z][A-Za-z0-9_]*": {
          "type": "object",
          "properties": {
            "a": {
              "type": "array",
              "items": { "$ref": "#/$defs/literal" }
            },
            "ann": { "$ref": "#/$defs/annotations" },
            "introduced": {
              "type": "boolean"
            },
            "defined": {
              "type": "boolean"
            }
          },
          "required": ["a"]
        }
      }
    },
    "constraints": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "id": { "$ref": "#/$defs/identifier" },
          "args": {
            "type": "array",
            "items": { "$ref": "#/$defs/argument" }
          },
          "ann": { "$ref": "#/$defs/annotations" },
          "defines": { "$ref": "#/$defs/identifier" }
        },
        "required": ["id", "args"]
      }
    },
    "output": {
      "type": "array",
      "items": { "$ref": "#/$defs/identifier" }
    },
    "solve": {
      "type": "object",
      "properties": {
        "method": {
          "enum": ["satisfy", "minimize", "maximize"]
        },
        "objective": { "$ref": "#/$defs/literal" },
        "ann": { "$ref": "#/$defs/annotations" }
      },
      "required": ["method"]
    }
  },
  "required": [
    "version",
    "variables",
    "arrays",
    "output",
    "constraints",
    "solve"
  ]
}

Used commandline:

datamodel-codegen --input schema.json --input-file-type=jsonschema --output-model-type="pydantic_v2.BaseModel" --collapse-root-models

Expected behavior The definition of AnnotationCall was the source of the error in this case.

The field being linted by Black [which failed] was

class AnnotationCall(BaseModel):
    id: constr(pattern=r'[A-Za-z][A-Za-z0-9_]*') 
    args: List[Union[List[Union[, AnnotationCall]], Union[, AnnotationCall]]]

Which the schema shows as

    "annotationCall": {
      "type": "object",
      "properties": {
        "id": { "$ref": "#/$defs/identifier" },
        "args": {
          "type": "array",
          "items": { "$ref": "#/$defs/annotationArgument" }
        }
      },
      "required": ["id", "args"]
    },

So I'd suggest it should have been

class AnnotationCall(BaseModel):
    id: constr(pattern=r'[A-Za-z][A-Za-z0-9_]*') 
    args: List[AnnotationArgument]

where AnnotationArgument is defined as the union of AnnotationLiteral and AnnotationLiterals

    "annotationArgument": {
      "oneOf": [
        { "$ref": "#/$defs/annotationLiterals" },
        { "$ref": "#/$defs/annotationLiteral" }
      ]
    },

which as you might guess is just the scalar vs the list of scalars:

    "annotationLiterals": {
      "type": "array",
      "items": { "$ref": "#/$defs/annotationLiteral" }
    },
    "annotationLiteral": {
      "oneOf": [
        { "$ref": "#/$defs/literal" },
        { "$ref": "#/$defs/annotationCall" }
      ]
    },

We see here that the AnnotationLiteral is just a Union[Literal, AnnotationCall]`.

The problem seems to be from the root model, which (if we do not pass --collapse-root-models) appears as:

class AnnotationLiteral(RootModel[Union[Literal, AnnotationCall]]):
    root: Union[Literal, AnnotationCall]

In fact they all appear like this:

class Annotation(RootModel[Union[AnnotationCall, str]]):
    root: Union[AnnotationCall, str]


class AnnotationArgument(RootModel[Union[AnnotationLiterals, AnnotationLiteral]]):
    root: Union[AnnotationLiterals, AnnotationLiteral]


class AnnotationCall(BaseModel):
    id: Identifier
    args: List[AnnotationArgument]


class AnnotationLiterals(RootModel[List[AnnotationLiteral]]):
    root: List[AnnotationLiteral]


class AnnotationLiteral(RootModel[Union[Literal, AnnotationCall]]):
    root: Union[Literal, AnnotationCall]

I recall that you specify a root model either 'inline' or with a single field - the docs give these examples:

Pets = RootModel[List[str]]

class Pets(RootModel):
    root: list[str]

Datamodel-codegen appears to be merging both formats into one Frankenmodel! 😱 which then is not processed correctly by the root model collapse routine, hence the invalid Python code and the Black crash.

Version:

  • OS: Linux
  • Python version: 3.12.6
  • datamodel-code-generator version: 0.28.4

lmmx avatar Mar 18 '25 00:03 lmmx