datamodel-code-generator
datamodel-code-generator copied to clipboard
Dataclass generation does not include specified string format like uuid
Describe the bug When i generate my models as dataclass the generater does not consider string formats like uuid, date ... The output is just of type string which is different when i generate pydantic models.
To Reproduce
Example schema:
{
"openapi": "3.0.3",
"components": {
"schemas": {
"Input": {
"type": "object",
"additionalProperties": false,
"required": ["id"],
"properties": {
"id": {
"type": "string",
"format": "uuid"
}
}
}
}
}
}
The generated result:
from __future__ import annotations
from dataclasses import dataclass
@dataclass
class Input:
id: str
Used commandline:
$ datamodel-codegen --input ../api/openapi.json --output test.py --output-model-type dataclasses.dataclass
Expected behavior I would expect that the id type is of type UUID like when i generate as pydantic:
from __future__ import annotations
from uuid import UUID
from pydantic import BaseModel, ConfigDict
class Input(BaseModel):
model_config = ConfigDict(
extra='forbid',
)
id: UUID
Version:
- OS: win10
- Python version: 3.11.2
- datamodel-code-generator version: 0.25.1
@maddin1991
When i generate my models as dataclass the generater does not consider string formats like uuid, date ... The output is just of type string which is different when i generate pydantic models.
This is a very good question.
I envisioned taking data defined in a schema and importing or dumping it into an output data class.
Since the dataclass
does not cast, I figured that just a string or number would be a better state to use!
However, what you say makes sense, and defining a type such as uuid
is not a bad idea.
How about this being offered as a CLI option?
I'm not that deep into dataclasses to understand what you mean by that they dont cast but i would expect the same behaviour like when generating pydantic models. A CLI option would be great but in my opinion the expectance is that this option is enabled as default.
Could you take this apportunity to explain why there is a improved-datamodel-codegen which explicitly lists many string formats as implemented while this repo does not. Whats the reason to have this two versions in parallel, or is there even a difference? Because the differences are not clear to me from the docs.
Thank you and merry christmas! :)
@maddin1991
I'm not that deep into dataclasses to understand what you mean by that they dont cast but i would expect the same behaviour like when generating pydantic models. A CLI option would be great but in my opinion the expectance is that this option is enabled as default.
My assumption is that this tool will be used in applications such as API and Message Producers. In that case, let's say the schema contains uuid and fomat, and the received messages would be mapped to dataclass. In this case, you should have received a string, but the dataclass will ask for a uuid.UUID. Isn't this a little difficult to use?
I understand what you are saying, but I considered it quite inconvenient in actual use. However, I think we need a model output that reflects the format you are talking about. (whether as an option or default).
Could you take this apportunity to explain why there is a improved-datamodel-codegen which explicitly lists many string formats as implemented while this repo does not.
I don't know the repo 🤔 Who manages the pypi?
which explicitly lists many string formats as implemented
Could you show me the list?
My assumption is that this tool will be used in applications such as API and Message Producers. In that case, let's say the schema contains uuid and fomat, and the received messages would be mapped to dataclass. In this case, you should have received a string, but the dataclass will ask for a uuid.UUID. Isn't this a little difficult to use?
I understand what you are saying, but I considered it quite inconvenient in actual use.
Your assumptions seems very reasonable to me. However, I also find it unintuitive to lose information that is contained in the OpenAPI Specification with the dataclass
option when the library clearly is able to preserve it as seen in the Pydantic
case.
For context, here is a use case that would greatly benefit from the more detailed type hint:
# %% Imports
from __future__ import annotations
from dataclasses import dataclass
from uuid import UUID
from mashumaro import DataClassDictMixin
# %% Generated code
@dataclass
class Input(DataClassDictMixin):
id: UUID
# %% Usage
api_input_dict = {"id": "a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11"}
input_object = Input.from_dict(api_input_dict)
assert isinstance(input_object.id, UUID) # True
As you rightly say, using dataclass
without some means of converting a received string into the specified type would be inconvinient. mashumaro
is able to derive the code necessary for that conversion from a dataclass
definition with sufficiently detailed type hints and since datamodel-code-generator
allows customization of the base class there already is a convinient way to integrate this.
About Mashumaro: "This library works by taking the schema of the data and generating a specific decoder and encoder for exactly that schema, taking into account the specifics of serialization format."
Thank you for considering this scenario.
@J-L0 I knew Mashumaro existed, but I didn't know they could do this. Thanks for letting me know. Certainly, a strict type reflecting format is needed for dataclass as well in the way you presented it. So we will add a new option (even if it means preventing destructive changes), let's also describe how to parse with Mashumaro in the documentation.
I got here as I would like to have format: ulid
.
Looking at the code, it looks like uuid format is already supported (at least it's in https://github.com/koxudaxi/datamodel-code-generator/blob/af7aa97fc661f6ff5e73ec5e027cb6ad43244179/datamodel_code_generator/parser/jsonschema.py#L125 and https://github.com/koxudaxi/datamodel-code-generator/blob/af7aa97fc661f6ff5e73ec5e027cb6ad43244179/datamodel_code_generator/model/pydantic/types.py#L81) so if I'm not misreading the code, this issue could be closed.
I think it could be nice if it was possible to extend json_schema_data_formats
and type_map_factory()
with custom formats. Id doesn't seem like it'd be too much effort to just extend those two places by providing some custom code.