datamodel-code-generator
datamodel-code-generator copied to clipboard
Incorrect regex in generated pydantic model, when regex pattern is under oneOf/anyOf nested inside a property of a schema
Describe the bug If I include a regex pattern, under a oneOf/anyOf which is nested inside a property of a schema, the resulting pydantic mdoel has extra spaces after each comma in the regex. This doesn't happen without oneOf/anyOf, or when the oneOf/anyOf is at the top level of the schema.
To Reproduce
Example schema:
openapi: "3.0.0"
components:
schemas:
Model1:
description: Model1
properties:
strAttribute:
type: string
pattern: '^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$'
Model2:
description: Model2
properties:
strOrIntAttribute:
oneOf:
- type: int
- type: string
pattern: '^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$'
StrOrIntAttribute:
description: StrOrIntAttribute
oneOf:
- type: int
- type: string
pattern: '^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$'
Used commandline:
$ datamodel-codegen --input api.yaml --output model.py
Expected behavior For Model2, I expected to get the following pydantic model to be created:
class Model2(BaseModel):
strOrIntAttribute: Optional[
Union[Any, constr(regex=r'^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$')]
] = None
But instead, I get this one (note the extra white spaces after each ",":
class Model2(BaseModel):
strOrIntAttribute: Optional[
Union[Any, constr(regex=r'^$|[^=, ]+=(?:on|off)+(?:, [^=, ]+=(?:on|off)+)*$')]
] = None
Version:
- OS: macOS
- Python version: 3.8.10
- datamodel-code-generator version: 0.25.1
Additional context Add any other context about the problem here.
Somewhat reproduced on current master with Python 3.13 on Linux:
[2] [kkini@kkini:/tmp]$ python -V
Python 3.13.1
[2] [kkini@kkini:/tmp]$ datamodel-codegen --version
0.30.1.dev1+gc0e19d92
[2] [kkini@kkini:/tmp]$ cat api.yaml
openapi: "3.0.0"
components:
schemas:
Model:
description: Model
properties:
strOrIntAttribute:
oneOf:
- type: int
- type: string
pattern: '^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$'
[2] [kkini@kkini:/tmp]$ datamodel-codegen --input-file-type openapi --input api.yaml --output model.py
[2] [kkini@kkini:/tmp]$ cat model.py
# generated by datamodel-codegen:
# filename: api.yaml
# timestamp: 2025-04-23T21:14:39+00:00
from __future__ import annotations
from typing import Any, Optional, Union
from pydantic import BaseModel, constr
class Model(BaseModel):
strOrIntAttribute: Optional[
Union[Any, constr(regex=r'^$|[^=,]+=(?:on|off)+(?:, [^=,]+=(?:on|off)+)*$')]
] = None
So on master, the output regex string is closer to the original input from the YAML, but still not identical:
- '^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$'
+ '^$|[^=,]+=(?:on|off)+(?:, [^=,]+=(?:on|off)+)*$'
Added a PR to avoid split inside constraint strings, resulting regex looks fine:
# generated by datamodel-codegen:
# filename: api.yaml
# timestamp: 2025-05-23T05:47:25+00:00
from __future__ import annotations
from typing import Any, Optional, Union
from pydantic import BaseModel, constr
class Model(BaseModel):
strOrIntAttribute: Optional[
Union[Any, constr(regex=r'^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$')]
] = None