datamodel-code-generator icon indicating copy to clipboard operation
datamodel-code-generator copied to clipboard

[PydanticV2] Add parameter to use Python Regex Engine in order to support look-around

Open ilovelinux opened this issue 11 months ago • 4 comments

Is your feature request related to a problem? Please describe.

  1. I wrote a valid JSON Schema with many properties
  2. Some properties' pattern make use of look-ahead and look-behind, which are supported by JSON Schema specifications. See: JSON Schema supported patterns
  3. datamodel-code-generator generated the PydanticV2 models.
  4. PydanticV2 doesn't support look-around, look-ahead and look-behind by default (see https://github.com/pydantic/pydantic/issues/7058)
ImportError while loading [...]: in <module>
[...]
.venv/lib/python3.12/site-packages/pydantic/_internal/_model_construction.py:205: in __new__
    complete_model_class(
.venv/lib/python3.12/site-packages/pydantic/_internal/_model_construction.py:552: in complete_model_class
    cls.__pydantic_validator__ = create_schema_validator(
.venv/lib/python3.12/site-packages/pydantic/plugin/_schema_validator.py:50: in create_schema_validator
    return SchemaValidator(schema, config)
E   pydantic_core._pydantic_core.SchemaError: Error building "model" validator:
E     SchemaError: Error building "model-fields" validator:
E     SchemaError: Field "version":
E     SchemaError: Error building "str" validator:
E     SchemaError: regex parse error:
E       ^((?!0[0-9])[0-9]+(\.(?!$)|)){2,4}$
E         ^^^
E   error: look-around, including look-ahead and look-behind, is not supported

Describe the solution you'd like PydanticV2 supports look-around, look-ahead and look-behind using Python as regex engine: https://github.com/pydantic/pydantic/issues/7058#issuecomment-2156772918

I'd like to have a configuration parameter to use python-re as regex engine for Pydantic V2.

Describe alternatives you've considered Workaround:

  1. Create a custom BaseModel
from pydantic import BaseModel, ConfigDict


class _BaseModel(BaseModel):
    model_config = ConfigDict(regex_engine='python-re')
  1. Use that class as BaseModel:
datamodel-codegen --base-model "module.with.basemodel._BaseModel"

EDIT:

Configuration used:

[tool.datamodel-codegen]
# Options
input = "<project>/data/schemas/"
input-file-type = "jsonschema"
output = "<project>/models/"
output-model-type = "pydantic_v2.BaseModel"
# Typing customization
base-class = "<project>.models._base_model._BaseModel"
enum-field-as-literal = "all"
use-annotated = true
use-standard-collections = true
use-union-operator = true
# Field customization
collapse-root-models = true
snake-case-field = true
use-field-description = true
# Model customization
disable-timestamp = true
enable-faux-immutability = true
target-python-version = "3.12"
use-schema-description = true
# OpenAPI-only options
#
# We may not want to use these options as we are not generating from OpenAPI schemas
# but this is a workaround to avoid `type | None` in when we have a default value.
#
# The author of the tool doesn't know why he flagged this option as OpenAPI only.
# Reference: https://github.com/koxudaxi/datamodel-code-generator/issues/1441
strict-nullable = true

ilovelinux avatar Dec 17 '24 15:12 ilovelinux

datamodel-code-genertor should automatically add ConfigDict(regex_engine='python-re') for you when it sees a field called pattern that is using look-around, look-ahead or look-behind regex.

Are you using collapse_root_models=True? Because I am using this and I see that ConfigDict(regex_engine='python-re') is not added to BaseModel that have merged fields from a RootModel where the RootModel had ConfigDict(regex_engine='python-re'). So I am curious if this also the same use case for you, and if so then it may be a bug.

dpeachey avatar Jan 23 '25 20:01 dpeachey

@dpeachey I am!

I edited the issue and I added the full config file I'm using 🙂

ilovelinux avatar Jan 24 '25 15:01 ilovelinux

Yeah then I think it is a bug. Ideally I think when using collapse_root_models=True the ConfigDict from the RootModel should be merged with the ConfigDict in the parent BaseModel.

dpeachey avatar Jan 24 '25 15:01 dpeachey

PR welcome.

gaborbernat avatar Feb 06 '25 20:02 gaborbernat