datamodel-code-generator
datamodel-code-generator copied to clipboard
[PydanticV2] Add parameter to use Python Regex Engine in order to support look-around
Is your feature request related to a problem? Please describe.
- I wrote a valid JSON Schema with many properties
- Some properties' pattern make use of look-ahead and look-behind, which are supported by JSON Schema specifications. See: JSON Schema supported patterns
datamodel-code-generatorgenerated the PydanticV2 models.- PydanticV2 doesn't support look-around, look-ahead and look-behind by default (see https://github.com/pydantic/pydantic/issues/7058)
ImportError while loading [...]: in <module>
[...]
.venv/lib/python3.12/site-packages/pydantic/_internal/_model_construction.py:205: in __new__
complete_model_class(
.venv/lib/python3.12/site-packages/pydantic/_internal/_model_construction.py:552: in complete_model_class
cls.__pydantic_validator__ = create_schema_validator(
.venv/lib/python3.12/site-packages/pydantic/plugin/_schema_validator.py:50: in create_schema_validator
return SchemaValidator(schema, config)
E pydantic_core._pydantic_core.SchemaError: Error building "model" validator:
E SchemaError: Error building "model-fields" validator:
E SchemaError: Field "version":
E SchemaError: Error building "str" validator:
E SchemaError: regex parse error:
E ^((?!0[0-9])[0-9]+(\.(?!$)|)){2,4}$
E ^^^
E error: look-around, including look-ahead and look-behind, is not supported
Describe the solution you'd like PydanticV2 supports look-around, look-ahead and look-behind using Python as regex engine: https://github.com/pydantic/pydantic/issues/7058#issuecomment-2156772918
I'd like to have a configuration parameter to use python-re as regex engine for Pydantic V2.
Describe alternatives you've considered Workaround:
- Create a custom BaseModel
from pydantic import BaseModel, ConfigDict
class _BaseModel(BaseModel):
model_config = ConfigDict(regex_engine='python-re')
- Use that class as BaseModel:
datamodel-codegen --base-model "module.with.basemodel._BaseModel"
EDIT:
Configuration used:
[tool.datamodel-codegen]
# Options
input = "<project>/data/schemas/"
input-file-type = "jsonschema"
output = "<project>/models/"
output-model-type = "pydantic_v2.BaseModel"
# Typing customization
base-class = "<project>.models._base_model._BaseModel"
enum-field-as-literal = "all"
use-annotated = true
use-standard-collections = true
use-union-operator = true
# Field customization
collapse-root-models = true
snake-case-field = true
use-field-description = true
# Model customization
disable-timestamp = true
enable-faux-immutability = true
target-python-version = "3.12"
use-schema-description = true
# OpenAPI-only options
#
# We may not want to use these options as we are not generating from OpenAPI schemas
# but this is a workaround to avoid `type | None` in when we have a default value.
#
# The author of the tool doesn't know why he flagged this option as OpenAPI only.
# Reference: https://github.com/koxudaxi/datamodel-code-generator/issues/1441
strict-nullable = true
datamodel-code-genertor should automatically add ConfigDict(regex_engine='python-re') for you when it sees a field called pattern that is using look-around, look-ahead or look-behind regex.
Are you using collapse_root_models=True? Because I am using this and I see that ConfigDict(regex_engine='python-re') is not added to BaseModel that have merged fields from a RootModel where the RootModel had ConfigDict(regex_engine='python-re'). So I am curious if this also the same use case for you, and if so then it may be a bug.
@dpeachey I am!
I edited the issue and I added the full config file I'm using 🙂
Yeah then I think it is a bug. Ideally I think when using collapse_root_models=True the ConfigDict from the RootModel should be merged with the ConfigDict in the parent BaseModel.
PR welcome.