pydantic-core
pydantic-core copied to clipboard
add default comparison
This PR tries to solve this pydantic issue . In my work I use objects of the type np.ndarray or pd.DataFrame a lot and excluding the defaults values has been a bit annoying due to the error mentioned in the issue. The idea of this PR is to add an extra optional argument in Field that I called default_comparison, which is a callable that receives two arguments value and default and replaces the __eq__ call of the object with a call to this function
Checklist
- [ x] Unit tests for the changes exist
- [ ] Documentation reflects the changes where applicable
- [x ] Pydantic tests pass with this
pydantic-core(except for expected changes) - [ x] My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers
Selected Reviewer: @adriangb
Codecov Report
Merging #1114 (d1f2343) into main (7fa450d) will increase coverage by
0.01%. The diff coverage is95.45%.
:exclamation: Current head d1f2343 differs from pull request most recent head 0614011. Consider uploading reports for the commit 0614011 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## main #1114 +/- ##
==========================================
+ Coverage 89.70% 89.72% +0.01%
==========================================
Files 106 106
Lines 16364 16380 +16
Branches 35 35
==========================================
+ Hits 14680 14697 +17
+ Misses 1677 1676 -1
Partials 7 7
| Files | Coverage Δ | |
|---|---|---|
| python/pydantic_core/core_schema.py | 94.76% <ø> (ø) |
|
| src/serializers/fields.rs | 95.02% <100.00%> (+0.37%) |
:arrow_up: |
| src/serializers/shared.rs | 78.02% <100.00%> (-1.31%) |
:arrow_down: |
| src/serializers/type_serializers/with_default.rs | 81.66% <94.11%> (+4.39%) |
:arrow_up: |
... and 3 files with indirect coverage changes
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 7fa450d...0614011. Read the comment docs.
CodSpeed Performance Report
Merging #1114 will degrade performances by 13.46%
Comparing andresliszt:add/default-comparison (0614011) with main (7fa450d)
Summary
❌ 1 regressions
✅ 139 untouched benchmarks
:warning: Please fix the performance issues or acknowledge them on CodSpeed.
Benchmarks breakdown
| Benchmark | main |
andresliszt:add/default-comparison |
Change | |
|---|---|---|---|---|
| ❌ | test_small_class_core_model |
50.2 µs | 58 µs | -13.46% |
please review
Hi @andresliszt thanks for the PR! I spoke to @adriangb a bit this morning and we think this is potentially reasonable to move forward with; we can't see a better solution for the exclude_defaults situation with numpy. My only question is how we think this addition will get exposed in the Python side in pydantic?
Also, on the linked issue I commented about exclude_unset; maybe this solves the problem without needing to "fix" exclude_defaults.
Hi @davidhewitt ! I was thinking to expose it using pydantic Field, adding a new argument default_comparison
import numpy as np
from pydantic import BaseModel, ConfigDict, Field
class Model(BaseModel):
my_field: np.ndarray = Field(default = np.array(), default_comparison = np.array_equal)
model_config = ConfigDict(arbitrary_types_allowed = True)
That would definitely be the obvious choice. Some other idea could be to make a special Default type to avoid a new parameter to the already-bloated Field:
import numpy as np
from pydantic import BaseModel, ConfigDict, Field, Default
class Model(BaseModel):
my_field: np.ndarray = Field(default = Default(np.array(), compare_as = np.array_equal))
model_config = ConfigDict(arbitrary_types_allowed = True)
(credit @adriangb)
After I sent my PR i started to think that excluding defaults is not the only place where the python __eq__ gets invoked. A simple model comparison will fail with numpy
Model(my_field = np.array([1,2,3])) == Model(my_field = np.array([1,2,3]))
>>> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I think adding compare_as to Field would solve it using it everywhere __eq__ is invoked
Interesting observation, I like the generality of this solution. attrs has this functionality also, so there is prior art.
https://www.attrs.org/en/stable/comparison.html#customization
I was poking around pydantic.BaseModel.__eq__ and due this comparison i think implement compare_as in Field is a bit complicated. I was reading the comments and dict comprehensions are not a choice due to the impact in the performance.
My first idea was to create a class, for instance
class FieldComparator:
def __init__(self, value: Any, compare_as: Callable):
self.value = value
self.comapre_as = compare_as
def __eq__(self, other):
return self.compare_as(self.value, other)
And replace all values in __dict__ for whose fields that need to overwrite the __eq__ method
# Naive way
self_dict = {k: FieldComparator(value = v, compare_as = self.model_fields[k].compare_as) if self.model_fields[k].compare_as is not None else v for k,v in self.__dict__.items()}
Then using self_dict instead of self.__dict__ in the downstreams would address the original issue.
I didn't try this and i don't know how it impacts in the performance, but what about creating a new attribute in the Rust side, say __pydantic_fields_overwrite_eq__ that is built in the same place that __pydantic_fields_set__ is built (here (?))
making easier the way self_dict is built