datamodel-code-generator icon indicating copy to clipboard operation
datamodel-code-generator copied to clipboard

Incorrect regex in generated pydantic model, when regex pattern is under oneOf/anyOf nested inside a property of a schema

Open pfaizian opened this issue 7 months ago • 2 comments

Describe the bug If I include a regex pattern, under a oneOf/anyOf which is nested inside a property of a schema, the resulting pydantic mdoel has extra spaces after each comma in the regex. This doesn't happen without oneOf/anyOf, or when the oneOf/anyOf is at the top level of the schema.

To Reproduce

Example schema:


openapi: "3.0.0"
components:
  schemas:
    Model1:
      description: Model1
      properties:
        strAttribute:
          type: string
          pattern: '^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$'
    Model2:
      description: Model2
      properties:
        strOrIntAttribute:
          oneOf:
            - type: int
            - type: string
              pattern: '^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$'
    StrOrIntAttribute:
      description: StrOrIntAttribute
      oneOf:
        - type: int
        - type: string
          pattern: '^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$'

Used commandline:

$ datamodel-codegen --input api.yaml --output model.py

Expected behavior For Model2, I expected to get the following pydantic model to be created:

class Model2(BaseModel):
    strOrIntAttribute: Optional[
        Union[Any, constr(regex=r'^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$')]
    ] = None

But instead, I get this one (note the extra white spaces after each ",":

class Model2(BaseModel):
    strOrIntAttribute: Optional[
        Union[Any, constr(regex=r'^$|[^=, ]+=(?:on|off)+(?:, [^=, ]+=(?:on|off)+)*$')]
    ] = None

Version:

  • OS: macOS
  • Python version: 3.8.10
  • datamodel-code-generator version: 0.25.1

Additional context Add any other context about the problem here.

pfaizian avatar Apr 23 '25 19:04 pfaizian

Somewhat reproduced on current master with Python 3.13 on Linux:

[2] [kkini@kkini:/tmp]$ python -V
Python 3.13.1

[2] [kkini@kkini:/tmp]$ datamodel-codegen --version
0.30.1.dev1+gc0e19d92

[2] [kkini@kkini:/tmp]$ cat api.yaml 
openapi: "3.0.0"
components:
  schemas:
    Model:
      description: Model
      properties:
        strOrIntAttribute:
          oneOf:
            - type: int
            - type: string
              pattern: '^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$'
 
[2] [kkini@kkini:/tmp]$ datamodel-codegen --input-file-type openapi --input api.yaml --output model.py

[2] [kkini@kkini:/tmp]$ cat model.py 
# generated by datamodel-codegen:
#   filename:  api.yaml
#   timestamp: 2025-04-23T21:14:39+00:00

from __future__ import annotations

from typing import Any, Optional, Union

from pydantic import BaseModel, constr


class Model(BaseModel):
    strOrIntAttribute: Optional[
        Union[Any, constr(regex=r'^$|[^=,]+=(?:on|off)+(?:, [^=,]+=(?:on|off)+)*$')]
    ] = None

So on master, the output regex string is closer to the original input from the YAML, but still not identical:

- '^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$'
+ '^$|[^=,]+=(?:on|off)+(?:, [^=,]+=(?:on|off)+)*$'

kini avatar Apr 23 '25 21:04 kini

Added a PR to avoid split inside constraint strings, resulting regex looks fine:

# generated by datamodel-codegen:
#   filename:  api.yaml
#   timestamp: 2025-05-23T05:47:25+00:00

from __future__ import annotations

from typing import Any, Optional, Union

from pydantic import BaseModel, constr


class Model(BaseModel):
    strOrIntAttribute: Optional[
        Union[Any, constr(regex=r'^$|[^=,]+=(?:on|off)+(?:,[^=,]+=(?:on|off)+)*$')]
    ] = None

minomocca avatar May 23 '25 05:05 minomocca