Field validators broken for nested models since v2.3.2
pydantic version: 2.8.0 OS: confirmed broken on Windows and Linux, haven't tested others pydantic-settings version: 2.3.2
Hello, I noticed that there is a breaking change in pydantic-settings==2.3.2, which I think was introduced by PR: https://github.com/pydantic/pydantic-settings/pull/309. Here is a minimal working example:
from pydantic import BaseModel, field_validator
import os
from pydantic_settings import BaseSettings, SettingsConfigDict
class Child(BaseModel):
ATTRIBUTE: list[str]
@field_validator("ATTRIBUTE", mode="before")
def convert_fields(cls, input: str) -> list[str]:
return input.split("|")
class Parent(BaseSettings):
model_config = SettingsConfigDict(
env_prefix="PREFIX_",
env_nested_delimiter="__",
case_sensitive=False,
env_file_encoding="utf-8",
)
child: Child
os.environ["PREFIX_CHILD__ATTRIBUTE"] = "a|b|c"
settings = Parent()
assert settings.child.ATTRIBUTE == ["a", "b", "c"]
This test passes for pydantic-settings up to and including 2.3.1, but has been broken since 2.3.2. The test doesn't simply fail: the instantiation of Parent raises with the following traceback:
Traceback (most recent call last):
File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 375, in __call__
field_value = self.prepare_field_value(field_name, field, field_value, value_is_complex)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 568, in prepare_field_value
env_val_built = self.explode_env_vars(field_name, field, self.env_vars)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 703, in explode_env_vars
raise e
File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 700, in explode_env_vars
env_val = self.decode_complex_value(last_key, target_field, env_val) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 187, in decode_complex_value
return json.loads(value)
^^^^^^^^^^^^^^^^^
File "C:\Python\Python311\Lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Python311\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Python311\Lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\software\pydantic-test\test.py", line 26, in <module>
settings = Parent()
^^^^^^^^
File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\main.py", line 141, in __init__
**__pydantic_self__._settings_build_values(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\main.py", line 311, in _settings_build_values
return deep_update(*reversed([source() for source in sources]))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\main.py", line 311, in <listcomp>
return deep_update(*reversed([source() for source in sources]))
^^^^^^^^
File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 377, in __call__
raise SettingsError(
pydantic_settings.sources.SettingsError: error parsing value for field "child" from source "EnvSettingsSource"
So it's trying to JSON-decode the string "a|b|c" and failing. But my field_validator is running in "before" mode
so this shouldn't happen. Indeed, if I simply comment-out the field_validator I see exactly the same error. So I believe that decorator is either being ignored altogether, or "before" mode has been broken.
Thanks @LachlanMarnham for reporting this.
Actually, the example code shouldn't have worked before because ATTRIBUTE field's type is list[str], and pydantic-settings considers it as a complex field. pydantic-settings parses the values of the complex field and this is the reason that you get the error.
By fixing the bug that we had before, you don't need a field_validator anymore. you can pass a list as value and pydantic-settings parses it.
from pydantic import BaseModel
import os
from pydantic_settings import BaseSettings, SettingsConfigDict
class Child(BaseModel):
ATTRIBUTE: list[str]
class Parent(BaseSettings):
model_config = SettingsConfigDict(
env_prefix="PREFIX_",
env_nested_delimiter="__",
case_sensitive=False,
env_file_encoding="utf-8",
)
child: Child
os.environ["PREFIX_CHILD__ATTRIBUTE"] = '["a", "b", "c"]'
settings = Parent()
assert settings.child.ATTRIBUTE == ["a", "b", "c"]
Ok thanks @hramezani. I think this is maybe a problem of documentation, with a conflict between pydantic and pydantic-settings. To your point, the pydantic-settings documentation does say:
Complex types like list, set, dict, and sub-models are populated from the environment by treating the environment variable's value as a JSON-encoded string.
Which is fair enough. However, pydantic guarantees that I can avoid this behaviour if I want to:
Before validators run before Pydantic's internal parsing and validation (e.g. coercion of a str to an int). These are more flexible than After validators since they can modify the raw input, but they also have to deal with the raw input, which in theory could be any arbitrary object.
It does feel a bit as if pydantic-settings is breaking a contract with pydantic here, given that internal parsing and validation is in fact taking place before the before validator.
Good point @LachlanMarnham.
The json parsing behavior was in pydanatic-settings v1(actually pydantic. because pydantic-settings was in pydantic in V1). That's why we keep the behavior in V2. I think we can have a config flag to disable parsing and let the people parse the value by before field_validator
I think we can have a config flag...
Do you mean a flag in the SettingsConfigDict? I guess such a flag can't really be added to the field_validator itself since that's owned by pydantic. Personally I don't rely on any settings being JSON, but I can imagine that some people might think it is a bit too much for the two options to be "all fields must be JSON" and "JSON de-serlialisation will be switched off anywhere" (unless I am mis-understanding you).
Maybe the way I'm using pydantic-settings is a bit of an edge case, seeing as 2.3.2 was released a few weeks ago and nobody else has raised a ticket yet. If you think the current behaviour is correct, just a bit out of sync with the documentation, I can open a PR to update the docs.
Thanks
~~Yes, I meant a flag in pydantic-settings SettingsConfigDict like enable_json_parsing which has two values:~~
- ~~
True- keeps the current behavior of pydantic-setting and is the default to prevent breaking change~~ - ~~
False- disables the json parsing of values and pass the value to pydantic. we can havefield_validatorin this case to parse the values if we want.~~
Maybe the way I'm using pydantic-settings is a bit of an edge case, seeing as 2.3.2 was released a few weeks ago and nobody else has raised a ticket yet. If you think the current behaviour is correct, just a bit out of sync with the documentation, I can open a PR to update the docs.
Yes, I think the current behaviour is correct. what do you think? which part is wrong?
I can imagine that some people might think it is a bit too much for the two options to be "all fields must be JSON" and "JSON de-serlialisation will be switched off anywhere"
I am "some people" here :smile: I would want to do custom parsing for only certain fields. How about some way to specify a list of fields for which to disable JSON parsing? Maybe a disable_json_parsing_for entry in SettingsConfigDict that takes a list of field names?
@fajpunk I updated my previous comment. the idea is to implement something like Annotated[int, Nodecode] to disable JSON parsing on the field level.
The json parsing behavior was in
pydanatic-settings v1(actually pydantic. because pydantic-settings was in pydantic in V1). That's why we keep the behavior in V2.
Was the behavior present in V1? I just updated a pre validator from v1 that always worked fine to a before validator and ran into this issue
@Numerlor I am not sure. Could you give me the validator that works in V1 and doesn't work in V2. I can probably help you if there is a way
@Numerlor I am not sure. Could you give me the validator that works in V1 and doesn't work in V2. I can probably help you if there is a way
The behaviors seem to differ between nested models and attributes directly on BaseSettings in v1, directly on BaseSettings produces the same error, nested models are parsed without issue
repro code v2
Same with BeforeValidator in Annotated, but this is the more direct translation of the v1 code
import json
import os
import typing
import pydantic
import pydantic_settings
from pydantic_settings import SettingsConfigDict
def parse_json_or_space_delimited_list(string: str) -> list[str]:
try:
return typing.cast(list[str], json.loads(string))
except json.JSONDecodeError:
return string.split(" ")
class Model(pydantic.BaseModel):
test: list[str] = []
test_validator = pydantic.field_validator("test", mode="before")(parse_json_or_space_delimited_list)
class Settings(pydantic_settings.BaseSettings):
test: Model
model_config = SettingsConfigDict(env_nested_delimiter="__")
os.environ["test__test"] = "a b c"
Settings()
Traceback (most recent call last):
File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/sources.py", line 375, in __call__
field_value = self.prepare_field_value(field_name, field, field_value, value_is_complex)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/sources.py", line 577, in prepare_field_value
raise e
File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/sources.py", line 574, in prepare_field_value
value = self.decode_complex_value(field_name, field, value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/sources.py", line 187, in decode_complex_value
return json.loads(value)
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/c/Users/numerlor/AppData/Roaming/JetBrains/PyCharm2024.2/scratches/scratch.py", line 21, in <module>
Settings()
File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/main.py", line 141, in __init__
**__pydantic_self__._settings_build_values(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/main.py", line 311, in _settings_build_values
return deep_update(*reversed([source() for source in sources]))
^^^^^^^^
File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/sources.py", line 377, in __call__
raise SettingsError(
pydantic_settings.sources.SettingsError: error parsing value for field "test" from source "EnvSettingsSource"
working v1
import json
import os
import typing
import pydantic.v1
def parse_json_or_space_delimited_list(string: str) -> list[str]:
try:
return typing.cast(list[str], json.loads(string))
except json.JSONDecodeError:
return string.split(" ")
class Model(pydantic.v1.BaseModel):
test: list[str] = []
test_validator = pydantic.v1.validator("test", pre=True)(parse_json_or_space_delimited_list)
class Settings(pydantic.v1.BaseSettings):
test: Model
class Config:
env_nested_delimiter="__"
os.environ["test__test"] = "a b c"
Settings()
@Numerlor, in V2, json.loads is used for decoding complex values. So, when pydantic-settings collects values and tries to decode complex values by json.loads. Right now, it is not possible to bypass the default behavior. but you can fix your problem by a custom env source settings class like:
import os
from typing import Any, Tuple, Type
from pydantic import BaseModel
from pydantic.fields import FieldInfo
from pydantic_settings import BaseSettings, SettingsConfigDict, EnvSettingsSource, PydanticBaseSettingsSource
class MyEnvSettingsSource(EnvSettingsSource):
def decode_complex_value(self, field_name: str, field: FieldInfo, value: Any) -> Any:
return value.split(" ")
class NestedSettings(BaseModel):
test: list[str]
class Settings(BaseSettings):
test: NestedSettings
model_config = SettingsConfigDict(env_nested_delimiter="__")
@classmethod
def settings_customise_sources(
cls,
settings_cls: Type[BaseSettings],
init_settings: PydanticBaseSettingsSource,
env_settings: PydanticBaseSettingsSource,
dotenv_settings: PydanticBaseSettingsSource,
file_secret_settings: PydanticBaseSettingsSource,
) -> Tuple[PydanticBaseSettingsSource, ...]:
return init_settings, MyEnvSettingsSource(settings_cls), dotenv_settings, file_secret_settings
os.environ["test__test"] = "a b c"
print(Settings())
For what it's worth, I've also just run into this issue, trying to do something like the following:
import os
from typing import Annotated
from pydantic import BeforeValidator
from pydantic_settings import BaseSettings
CommaSet = Annotated[set[str], BeforeValidator(lambda s: set(x for x in s.split(",") if x))]
class Example(BaseSettings):
EXAMPLE: CommaSet
os.environ["EXAMPLE"] = "asdf,fdsa"
print(Example())
It really would be nice if there was a way to tell pydantic-settings what the expected 'input' type for the type is and base the 'is complex value' logic on that. There is the 'WithJsonSchema' annotator already in Pydantic for example, maybe somethinge like that. Or just something like a 'DontDeJSON' annotator that pydantic-settings respects.
I created https://github.com/pydantic/pydantic-settings/pull/492 to add NoDecode and ForceDecode annotations. Also the enable_decoding config can control the decoding behavior for the whole model fields
Addressed in 3667aed