datamodel-code-generator
datamodel-code-generator copied to clipboard
Field name changing in Pydantic V2 model
Describe the bug I think this is due to the changes in this PR: https://github.com/koxudaxi/datamodel-code-generator/pull/2355
We create the Pydantic model from JSON schema and then use that model but the name has appended _1 on the field name. Is this appropriate behaviour to change the field name?
Is this so that the field name isn't the same as the enumeration class name? If so, maybe we keep the field name the same and use Enum as a suffix for the enumeration class instead?
To Reproduce Running this:
import pathlib
from datamodel_code_generator import DataModelType, InputFileType, generate
schema = {"title": "Test", "properties": {"Fruit": {"enum": ["apple", "banana"]}}}
output = pathlib.Path(__file__).parent / "test_model.py"
generate(
json.dumps(schema),
input_file_type=InputFileType.JsonSchema,
input_filename="example.json",
output=output,
output_model_type=DataModelType.PydanticV2BaseModel,
capitalise_enum_members=True,
)
produces
# generated by datamodel-codegen:
# filename: example.json
# timestamp: 2025-03-26T17:43:52+00:00
from __future__ import annotations
from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field
class Fruit(Enum):
APPLE = 'apple'
BANANA = 'banana'
class Test(BaseModel):
Fruit_1: Optional[Fruit] = Field(None, alias='Fruit')
Example schema:
{
"title": "Test",
"properties": {
"Fruit": {
"enum": [
"apple",
"banana"
]
}
}
}
Expected behaviour Maybe instead we could produce:
# generated by datamodel-codegen:
# filename: example.json
# timestamp: 2025-03-26T17:43:52+00:00
from __future__ import annotations
from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field
class FruitEnum(Enum):
APPLE = 'apple'
BANANA = 'banana'
class Test(BaseModel):
Fruit: Optional[FruitEnum] = Field(None, alias='Fruit')
Version:
- OS: Ubuntu 22.04.5 LTS
- Python version: 3.13.2
- datamodel-code-generator version: 0.28.5
@ollyhensby
Is this so that the field name isn't the same as the enumeration class name? If so, maybe we keep the field name the same and use
Enumas a suffix for the enumeration class instead?
Yes, that's correct. Are you suggesting we should prioritize preserving the field name over the class name?
I've been thinking about this issue as well. While in your example the field name with the _1 suffix looks odd, it's worth noting that the field still maintains the correct external representation through the alias.
Here are some considerations:
-
Current behavior (adding suffix to field name):
- Impact is limited to specific fields
- Changes are explicit and contained
- Original JSON structure is preserved via aliases
-
Proposed solution (adding
Enumsuffix to class name):- Class/model names can be referenced by multiple other classes
- If future field names conflict with type names, it might require more extensive code changes
- Could potentially affect more existing code
-
Alternative approach:
- Create type aliases with suffixes before the class definitions
- Use these aliased types in field type hints
Which approach do you think would be the best behavior for the library?
I recently got hit by this one. What I did I think is actually more proper in the end
Originally the schema was something like
{
"type": "object",
"properties": {
"foo": {
"type": "object"
}
}
}
I changed it to
{
"type": "object",
"properties": {
"foo": {
"$ref": "#/definitions/FooType"
}
}
"definitions": {
"FooType": {
"type": "object"
}
}
}
I say it's a bit more proper in that it separates the internal schema to go at a higher level and be named and potentially reused.
I say it's a bit more proper in that it separates the internal schema to go at a higher level and be named and potentially reused.
That is what I do as well for API specs that I write. However, I often have to work with APIs which I didn't define, that don't follow this workaround. In my case, datamodel-codegen is generating a total of 8 "Type" classes, all enums, such as:
class Type5(Enum):
debit = "Debit"
credit = "Credit"
This is pretty useless. If I use Type5, but at a future date this API specs adds a new operation using a new "Type" before this one, then suddenly Type5 will start meaning something completely different.
@koxudaxi, I am not sure which solution would be better as after some more thought, renaming Enums may end up in producing similar issues.
As you've said, in the current behaviour the original JSON structure is preserved via aliases and that resolves my use case as can use by_alias=True when using the method model_dump.
I also think this issue is an edge case and comes from my poorly defined schema. It would be better if my example schema was defined clearer like this:
{
"type": "object",
"properties": {"fruit": {"$ref": "#/definitions/Fruit"}},
"definitions": {
"Fruit": {
"enum": ["apple", "banana"],
"description": "Fruit",
"title": "Fruit",
"type": "string",
}
},
}
which is what @trajano suggested in his response.
In the schema above, the definition of the enumeration is clearly defined, and the property is clearly defined too (which now has a different name). With this schema, the issue will not occur.
FYI this error is related to one that I came across a while ago: https://github.com/koxudaxi/datamodel-code-generator/issues/2091
you fixed my issue adding a "_x" to the field name.
From my point of view option 2 (or maybe 3 though not sure I understand it clearly) defined here would be better; I think ensuring that field names are preserved is more important than Enum names.
Either way very happy the issue has been addressed - thanks