datamodel-code-generator icon indicating copy to clipboard operation
datamodel-code-generator copied to clipboard

Dataclass generation does not include specified string format like uuid

Open maddin1991 opened this issue 1 year ago • 7 comments

Describe the bug When i generate my models as dataclass the generater does not consider string formats like uuid, date ... The output is just of type string which is different when i generate pydantic models.

To Reproduce

Example schema:


{
  "openapi": "3.0.3",
  "components": {
    "schemas": {
      "Input": {
        "type": "object",
        "additionalProperties": false,
        "required": ["id"],
        "properties": {
          "id": {
            "type": "string",
            "format": "uuid"
          }
        }
      }
    }
  }
}

The generated result:

from __future__ import annotations
from dataclasses import dataclass


@dataclass
class Input:
    id: str

Used commandline:

$ datamodel-codegen --input ../api/openapi.json --output test.py --output-model-type dataclasses.dataclass

Expected behavior I would expect that the id type is of type UUID like when i generate as pydantic:

from __future__ import annotations
from uuid import UUID
from pydantic import BaseModel, ConfigDict


class Input(BaseModel):
    model_config = ConfigDict(
        extra='forbid',
    )
    id: UUID

Version:

  • OS: win10
  • Python version: 3.11.2
  • datamodel-code-generator version: 0.25.1

maddin1991 avatar Dec 21 '23 16:12 maddin1991

@maddin1991

When i generate my models as dataclass the generater does not consider string formats like uuid, date ... The output is just of type string which is different when i generate pydantic models.

This is a very good question. I envisioned taking data defined in a schema and importing or dumping it into an output data class. Since the dataclass does not cast, I figured that just a string or number would be a better state to use!

However, what you say makes sense, and defining a type such as uuid is not a bad idea. How about this being offered as a CLI option?

koxudaxi avatar Dec 24 '23 03:12 koxudaxi

I'm not that deep into dataclasses to understand what you mean by that they dont cast but i would expect the same behaviour like when generating pydantic models. A CLI option would be great but in my opinion the expectance is that this option is enabled as default.

Could you take this apportunity to explain why there is a improved-datamodel-codegen which explicitly lists many string formats as implemented while this repo does not. Whats the reason to have this two versions in parallel, or is there even a difference? Because the differences are not clear to me from the docs.

Thank you and merry christmas! :)

maddin1991 avatar Dec 24 '23 10:12 maddin1991

@maddin1991

I'm not that deep into dataclasses to understand what you mean by that they dont cast but i would expect the same behaviour like when generating pydantic models. A CLI option would be great but in my opinion the expectance is that this option is enabled as default.

My assumption is that this tool will be used in applications such as API and Message Producers. In that case, let's say the schema contains uuid and fomat, and the received messages would be mapped to dataclass. In this case, you should have received a string, but the dataclass will ask for a uuid.UUID. Isn't this a little difficult to use?

I understand what you are saying, but I considered it quite inconvenient in actual use. However, I think we need a model output that reflects the format you are talking about. (whether as an option or default).

Could you take this apportunity to explain why there is a improved-datamodel-codegen which explicitly lists many string formats as implemented while this repo does not.

I don't know the repo 🤔 Who manages the pypi?

which explicitly lists many string formats as implemented

Could you show me the list?

koxudaxi avatar Dec 27 '23 03:12 koxudaxi

My assumption is that this tool will be used in applications such as API and Message Producers. In that case, let's say the schema contains uuid and fomat, and the received messages would be mapped to dataclass. In this case, you should have received a string, but the dataclass will ask for a uuid.UUID. Isn't this a little difficult to use?

I understand what you are saying, but I considered it quite inconvenient in actual use.

Your assumptions seems very reasonable to me. However, I also find it unintuitive to lose information that is contained in the OpenAPI Specification with the dataclass option when the library clearly is able to preserve it as seen in the Pydantic case.

For context, here is a use case that would greatly benefit from the more detailed type hint:

# %% Imports
from __future__ import annotations
from dataclasses import dataclass
from uuid import UUID
from mashumaro import DataClassDictMixin


# %% Generated code
@dataclass
class Input(DataClassDictMixin):
    id: UUID


# %% Usage
api_input_dict = {"id": "a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11"}

input_object = Input.from_dict(api_input_dict)

assert isinstance(input_object.id, UUID)  # True

As you rightly say, using dataclass without some means of converting a received string into the specified type would be inconvinient. mashumaro is able to derive the code necessary for that conversion from a dataclass definition with sufficiently detailed type hints and since datamodel-code-generator allows customization of the base class there already is a convinient way to integrate this.

About Mashumaro: "This library works by taking the schema of the data and generating a specific decoder and encoder for exactly that schema, taking into account the specifics of serialization format."

Thank you for considering this scenario.

J-L0 avatar Dec 27 '23 11:12 J-L0

@J-L0 I knew Mashumaro existed, but I didn't know they could do this. Thanks for letting me know. Certainly, a strict type reflecting format is needed for dataclass as well in the way you presented it. So we will add a new option (even if it means preventing destructive changes), let's also describe how to parse with Mashumaro in the documentation.

koxudaxi avatar Dec 27 '23 13:12 koxudaxi

I got here as I would like to have format: ulid.

Looking at the code, it looks like uuid format is already supported (at least it's in https://github.com/koxudaxi/datamodel-code-generator/blob/af7aa97fc661f6ff5e73ec5e027cb6ad43244179/datamodel_code_generator/parser/jsonschema.py#L125 and https://github.com/koxudaxi/datamodel-code-generator/blob/af7aa97fc661f6ff5e73ec5e027cb6ad43244179/datamodel_code_generator/model/pydantic/types.py#L81) so if I'm not misreading the code, this issue could be closed.

I think it could be nice if it was possible to extend json_schema_data_formats and type_map_factory() with custom formats. Id doesn't seem like it'd be too much effort to just extend those two places by providing some custom code.

Panaetius avatar Aug 08 '24 09:08 Panaetius