Validate enum values against other possible forms of expected values
Change Summary
Validate enum values against other possible forms of expected values, specifically tuples. The problem is actually bigger than what the initial commit solves. Please see my comment.
Related issue number
Fixes https://github.com/pydantic/pydantic/issues/10629
Checklist
- [x] Unit tests for the changes exist
- [ ] Documentation reflects the changes where applicable
- [x] Pydantic tests pass with this
pydantic-core(except for expected changes) - [x] My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers
Selected Reviewer: @sydney-runkle
CodSpeed Performance Report
Merging #1502 will not alter performance
Comparing changhc:10629-fix-enum (06ac56d) with main (cd0346d)
Summary
✅ 155 untouched benchmarks
Codecov Report
Attention: Patch coverage is 97.29730% with 1 line in your changes missing coverage. Please review.
Project coverage is 89.39%. Comparing base (
ab503cb) to head (06ac56d). Report is 226 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/validators/enum_.rs | 96.96% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #1502 +/- ##
==========================================
- Coverage 90.21% 89.39% -0.83%
==========================================
Files 106 112 +6
Lines 16339 17966 +1627
Branches 36 40 +4
==========================================
+ Hits 14740 16060 +1320
- Misses 1592 1886 +294
- Partials 7 20 +13
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/serializers/config.rs | 94.62% <100.00%> (+0.17%) |
:arrow_up: |
| src/serializers/mod.rs | 100.00% <ø> (ø) |
|
| src/validators/enum_.rs | 93.37% <96.96%> (ø) |
... and 53 files with indirect coverage changes
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 2419981...06ac56d. Read the comment docs.
This issue lies mostly in validating JSON values. Take the case in the issue as an example:
from enum import Enum
class A(Enum):
a = 1, 2
serialised = RootModel[A](A.a).model_dump_json()
parsed = RootModel[A].model_validate_json(serialised) # fail
What happens here is that A.a is serialised as [1, 2], but the validator is built with one possible enum value (1, 2) and thus cannot find any possible value that matches the input value [1, 2].
My initial commit does fix this issue by adding list(tuple) to the validator, but this does not cover all cases. The complete solution is to load the validator with serialised possible enum values and use them when the validator deals with JSON input values. Think the following example:
class Model(BaseModel):
x: int
@model_serializer
def ser_model(self):
return [self.x]
class A(Enum):
a = 2, 3
b = Model(x=1)
serialised = RootModel[A](A.b).model_dump_json() # [1]
parsed = RootModel[A].model_validate_json(serialised) # fail
To fix this issue completely, we need to know the serialised form of all enum values and add them to the validator. My initial fix solves the problem because tuples are serialised as JSON arrays and casting tuples to lists covers this case. The problem for me now is that I don't know how to find the corresponding serialisers given a PyAny instance. Since we know what possible enum values are and they are a limited set, I imagine this to be straightforward. @davidhewitt could you point me to the right place?
By the way, the following case is expected:
class MyEnum(Enum):
a = 1, 2
b = [1, 2]
v = SchemaValidator(core_schema.enum_schema(MyEnum, list(MyEnum.__members__.values())))
assert v.validate_json('[1, 2]') is MyEnum.a
Since both MyEnum.a and MyEnum.b are identical after serialisation, there is no way to determine what a JSON value represented before it was serialised, the first matched enum value is returned.
I think you might want to be looking in infer.rs to see what we would serialize for an arbitrary object?
Hi @davidhewitt, I've finally come up with a seemingly generalisable solution, but I would like to check with you and see if I missed out anything.
The idea is basically that we create a separate lookup in EnumValidator that contains the JSON form of all enum values and use this lookup for validation when the input type is JSON. The JSON form is acquired by calling serializers::to_jsonable_python. So far my implementation works with the following scenarios:
# root model with enum round trip
class Model(BaseModel):
x: int
@model_serializer
def ser_model(self):
return [self.x]
class A(Enum):
e = "1+2j"
a = 2, 3
b = complex(1, 2)
c = datetime.datetime.fromisoformat("2024-01-01T00:00:00Z")
d = Model(x=2)
f = float('inf')
for v in A.__members__.values():
RootModel[A].model_validate_json(RootModel[A](v).model_dump_json())
# base model with enum round trip
class M(BaseModel):
v: A
#model_config = ConfigDict(ser_json_inf_nan='strings', use_enum_values=True)
for v in A.__members__.values():
M.model_validate_json(M(v=v).model_dump_json())
I'm not quite sure about calling to_jsonable_python because ideally we should compose the JSON lookup using EnumSerializer so that it would be a round trip without any unexpected mismatch, e.g. think the complex value inference I missed out in my original implementation, but meanwhile that means exposing quite a few things from the serializer module and I'm not sure if that's the right thing to do.
At the same time, I have some questions regarding implementations around to_python(mode='JSON'), especially regarding float. Do you know why here it does not take inf_nan_mode into consideration? This is a case where my implementation does not fully work. In the class M above, if you uncomment model_config, it gives an error when processing A.f, that float('inf'):
pydantic_core._pydantic_core.ValidationError: 1 validation error for M
v
Input should be '1+2j', (2, 3), (1+2j), datetime.datetime(2024, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), Model(x=2) or inf [type=enum, input_value='Infinity', input_type=str]
because the serialized value in model_dump_json ("Infinity") does not match the one acquired with to_jsonable_python (inf).
By the way, I'm still listing the original values in the error message instead of listing them in the JSON form because I'm not sure if showing the JSON form would be confusing.
Sorry if this is not very clear because I'm in a bit of a rush, but please let me know what you think. Thank you!
please review