datamodel-code-generator
datamodel-code-generator copied to clipboard
`schema` -> `msgspec.Struct` / `typing.TypedDict` converts `List[Any]` to `List`
I'm currently generating a series of msgspec.Structs from previously generated JSON schemas that originate from custom written pydantic models.
An example of such models is the following:
from typing import Any, Dict, List
from pydantic import BaseModel, ConfigDict, Field, RootModel
from typing_extensions import Annotated
class DataFrameForDatumPage(RootModel):
root: List[str] = Field(alias="Dataframe")
class DatumPage(BaseModel):
"""Page of documents to reference a quanta of externally-stored data"""
model_config = ConfigDict(extra="forbid")
datum_id: Annotated[
DataFrameForDatumPage,
Field(
description="Array unique identifiers for each Datum (akin to 'uid' for "
"other Document types), typically formatted as '<resource>/<integer>'"
),
]
datum_kwargs: Annotated[
Dict[str, List[Any]],
Field(
description="Array of arguments to pass to the Handler to "
"retrieve one quanta of data"
),
]
resource: Annotated[
str,
Field(
description="The UID of the Resource to which all Datums in the page belong"
),
]
The schema -> struct/typed_dict pipeline is the following:
import json
from pathlib import Path
import datamodel_code_generator as generator
# just a file I locally created with the model described above
from datamodel_sandbox.pydantic_model import DatumPage
with open("schema.json", "w") as f:
f.write(json.dumps(DatumPage.model_json_schema(indent=2), indent=4))
output_paths = [
Path("msgspec_struct.py"),
Path("typed_dict.py"),
]
output_model_types = [
generator.DataModelType.MsgspecStruct,
generator.DataModelType.TypingTypedDict,
]
for output_path, output_model_type in zip(output_paths, output_model_types):
generator.generate(
input_=Path("schema.json"),
input_file_type=generator.InputFileType.JsonSchema,
output=output_path,
output_model_type=output_model_type,
target_python_version=generator.PythonVersion.PY_38,
use_schema_description=True,
use_field_description=True,
use_annotated=True,
field_constraints=True,
wrap_string_literal=True,
use_double_quotes=True,
disable_timestamp=True,
)
The problem is that in the generated Struct/TypedDict, DatumPage.datum_kwargs type is Dict[str, List], while it should instead be Dict[str, List[Any]].
The generated schema is the following ...
{
"$defs": {
"DataFrameForDatumPage": {
"items": {
"type": "string"
},
"title": "DataFrameForDatumPage",
"type": "array"
}
},
"additionalProperties": false,
"description": "Page of documents to reference a quanta of externally-stored data",
"properties": {
"datum_id": {
"$ref": "#/$defs/DataFrameForDatumPage",
"description": "Array unique identifiers for each Datum (akin to 'uid' for other Document types), typically formatted as '<resource>/<integer>'"
},
"datum_kwargs": {
"additionalProperties": {
"items": {},
"type": "array"
},
"description": "Array of arguments to pass to the Handler to retrieve one quanta of data",
"title": "Datum Kwargs",
"type": "object"
},
"resource": {
"description": "The UID of the Resource to which all Datums in the page belong",
"title": "Resource",
"type": "string"
}
},
"required": [
"datum_id",
"datum_kwargs",
"resource"
],
"title": "DatumPage",
"type": "object"
}
... the generated msgspec.Struct ...
# generated by datamodel-codegen:
# filename: schema.json
from __future__ import annotations
from typing import Dict, List
from msgspec import Meta, Struct
from typing_extensions import Annotated
DataFrameForDatumPage = Annotated[List[str], Meta(title="DataFrameForDatumPage")]
class DatumPage(Struct):
"""
Page of documents to reference a quanta of externally-stored data
"""
datum_id: Annotated[
DataFrameForDatumPage,
Meta(
description=(
"Array unique identifiers for each Datum (akin to 'uid' for other"
" Document types), typically formatted as '<resource>/<integer>'"
)
),
]
"""
Array unique identifiers for each Datum (akin to 'uid' for other Document types), typically formatted as '<resource>/<integer>'
"""
datum_kwargs: Annotated[
Dict[str, List], # this should be Dict[str, List[Any]]
Meta(
description=(
"Array of arguments to pass to the Handler to retrieve one quanta of"
" data"
),
title="Datum Kwargs",
),
]
"""
Array of arguments to pass to the Handler to retrieve one quanta of data
"""
resource: Annotated[
str,
Meta(
description=(
"The UID of the Resource to which all Datums in the page belong"
),
title="Resource",
),
]
"""
The UID of the Resource to which all Datums in the page belong
"""
... and typing.TypedDict equivalent.
# generated by datamodel-codegen:
# filename: schema.json
from __future__ import annotations
from typing import Dict, List, TypedDict
DataFrameForDatumPage = List[str]
class DatumPage(TypedDict):
"""
Page of documents to reference a quanta of externally-stored data
"""
datum_id: DataFrameForDatumPage
"""
Array unique identifiers for each Datum (akin to 'uid' for other Document types), typically formatted as '<resource>/<integer>'
"""
datum_kwargs: Dict[str, List] # this should be Dict[str, List[Any]]
"""
Array of arguments to pass to the Handler to retrieve one quanta of data
"""
resource: str
"""
The UID of the Resource to which all Datums in the page belong
"""
I don't exactly know if I'm missing an option in the generation script, if it is an intended behavior or there's something wrong with the pydantic.BaseModel which is used as reference to generate the schema.
EDIT: gave a more detailed description after the comment of @evalott100
DatumPage.datum_kwargs type is Dict[str, List], while it should instead be Dict[str, List[Any]]. I don't exactly know if I'm missing an option in the generation script or if it is an intended behavior.
Just to add a little context, from jsonschema we generate a series of TypedDicts (and maybe soon msgspec).
We'd like these to pass mypy type-checking with --strict, which x: List (or List[Unknown]) fails.
It'd be very handy if there were an option which would automatically use List[Any]/Dict[Any, Any] in converted fields from schema which don't have any constraints on the object/array elements.
This is still pending; since there's no reply yet I may try to squeeze some time into it. Would it be possible to have some input on where to start searching for the problem?