pydantic-core icon indicating copy to clipboard operation
pydantic-core copied to clipboard

add default comparison

Open andresliszt opened this issue 1 year ago • 9 comments

This PR tries to solve this pydantic issue . In my work I use objects of the type np.ndarray or pd.DataFrame a lot and excluding the defaults values has been a bit annoying due to the error mentioned in the issue. The idea of this PR is to add an extra optional argument in Field that I called default_comparison, which is a callable that receives two arguments value and default and replaces the __eq__ call of the object with a call to this function

Checklist

  • [ x] Unit tests for the changes exist
  • [ ] Documentation reflects the changes where applicable
  • [x ] Pydantic tests pass with this pydantic-core (except for expected changes)
  • [ x] My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @adriangb

andresliszt avatar Dec 03 '23 23:12 andresliszt

Codecov Report

Merging #1114 (d1f2343) into main (7fa450d) will increase coverage by 0.01%. The diff coverage is 95.45%.

:exclamation: Current head d1f2343 differs from pull request most recent head 0614011. Consider uploading reports for the commit 0614011 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1114      +/-   ##
==========================================
+ Coverage   89.70%   89.72%   +0.01%     
==========================================
  Files         106      106              
  Lines       16364    16380      +16     
  Branches       35       35              
==========================================
+ Hits        14680    14697      +17     
+ Misses       1677     1676       -1     
  Partials        7        7              
Files Coverage Δ
python/pydantic_core/core_schema.py 94.76% <ø> (ø)
src/serializers/fields.rs 95.02% <100.00%> (+0.37%) :arrow_up:
src/serializers/shared.rs 78.02% <100.00%> (-1.31%) :arrow_down:
src/serializers/type_serializers/with_default.rs 81.66% <94.11%> (+4.39%) :arrow_up:

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7fa450d...0614011. Read the comment docs.

codecov[bot] avatar Dec 03 '23 23:12 codecov[bot]

CodSpeed Performance Report

Merging #1114 will degrade performances by 13.46%

Comparing andresliszt:add/default-comparison (0614011) with main (7fa450d)

Summary

❌ 1 regressions ✅ 139 untouched benchmarks

:warning: Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main andresliszt:add/default-comparison Change
test_small_class_core_model 50.2 µs 58 µs -13.46%

codspeed-hq[bot] avatar Dec 03 '23 23:12 codspeed-hq[bot]

please review

andresliszt avatar Dec 06 '23 21:12 andresliszt

Hi @andresliszt thanks for the PR! I spoke to @adriangb a bit this morning and we think this is potentially reasonable to move forward with; we can't see a better solution for the exclude_defaults situation with numpy. My only question is how we think this addition will get exposed in the Python side in pydantic?

Also, on the linked issue I commented about exclude_unset; maybe this solves the problem without needing to "fix" exclude_defaults.

davidhewitt avatar Dec 13 '23 13:12 davidhewitt

Hi @davidhewitt ! I was thinking to expose it using pydantic Field, adding a new argument default_comparison

import numpy as np
from pydantic import BaseModel, ConfigDict, Field

class Model(BaseModel):
    my_field: np.ndarray = Field(default = np.array(), default_comparison = np.array_equal)
    model_config = ConfigDict(arbitrary_types_allowed = True)

andresliszt avatar Dec 13 '23 15:12 andresliszt

That would definitely be the obvious choice. Some other idea could be to make a special Default type to avoid a new parameter to the already-bloated Field:

import numpy as np
from pydantic import BaseModel, ConfigDict, Field, Default

class Model(BaseModel):
    my_field: np.ndarray = Field(default = Default(np.array(), compare_as = np.array_equal))
    model_config = ConfigDict(arbitrary_types_allowed = True)

(credit @adriangb)

davidhewitt avatar Dec 13 '23 15:12 davidhewitt

After I sent my PR i started to think that excluding defaults is not the only place where the python __eq__ gets invoked. A simple model comparison will fail with numpy

Model(my_field = np.array([1,2,3])) == Model(my_field = np.array([1,2,3])) 
>>> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I think adding compare_as to Field would solve it using it everywhere __eq__ is invoked

andresliszt avatar Dec 13 '23 16:12 andresliszt

Interesting observation, I like the generality of this solution. attrs has this functionality also, so there is prior art.

https://www.attrs.org/en/stable/comparison.html#customization

davidhewitt avatar Dec 13 '23 16:12 davidhewitt

I was poking around pydantic.BaseModel.__eq__ and due this comparison i think implement compare_as in Field is a bit complicated. I was reading the comments and dict comprehensions are not a choice due to the impact in the performance.

My first idea was to create a class, for instance

class FieldComparator:
    def __init__(self, value: Any, compare_as: Callable):
        self.value = value
        self.comapre_as = compare_as
    def __eq__(self, other):
        return self.compare_as(self.value, other)

And replace all values in __dict__ for whose fields that need to overwrite the __eq__ method

# Naive way
self_dict = {k: FieldComparator(value = v, compare_as = self.model_fields[k].compare_as) if self.model_fields[k].compare_as is not None else v for k,v in self.__dict__.items()}

Then using self_dict instead of self.__dict__ in the downstreams would address the original issue.

I didn't try this and i don't know how it impacts in the performance, but what about creating a new attribute in the Rust side, say __pydantic_fields_overwrite_eq__ that is built in the same place that __pydantic_fields_set__ is built (here (?)) making easier the way self_dict is built

andresliszt avatar Dec 15 '23 20:12 andresliszt