pydantic-core
pydantic-core copied to clipboard
Add `unique` constraint to list?
I'm not sure about this, it would need to have minimal or no impact on performance when switched off.
Some questions if we are to support it:
- Should we implement
uniqueonly on the list validator or on tuple, set, frozenset and general iterables as we build the vec? - Should we implement a
uniquecheck onsetandfrozensetvalidators - this would have minimal performance impact as it would involve simply checking that the length of the input before and after creating the set, but sets loose order so won't be sufficient for some scenarios - what's the most performant way to implement this in rust??? I guess for
list[int]and a few other types, we could do something pretty performant with anAHashSetbut a general case will require some fairly slow python.
I may be confused but, why set and frozenset? Those collections doesn't allow for duplicate values.
About the unique implementation, would it be compared by equality or by identity?
equality.
I may be confused but, why
setandfrozenset? Those collections doesn't allow for duplicate values.
Well, the question is whether [1, 2, 2, 3] as input to a set should raise an error - I think that's a useful option, but doesn't solve all problems since sets loose order.
Ok, I understood. I think it's okay and is up to the developer whether to use it or not.
I'll try to do an implementation as a draft or as a reference.
Great, thank you.
Ok, I understood. I think it's okay and is up to the developer whether to use it or not.
I'll try to do an implementation as a draft or as a reference.
Q: You are doing this in python? Or rust? (Asking out of curiosity >> I want to learn how this stuff works.)
this needs to be in rust since the list validator is all written in rust, see https://github.com/pydantic/pydantic-core/blob/9e54f7833a0d88daf9696430756007f0f7f8dc0c/src/validators/list.rs#L96
After the removal of unique_items in v2 and this issue there is no way in v2 to specify a unique array when serializing the model.
After the removal of
unique_itemsin v2 and this issue there is no way in v2 to specify a unique array when serializing the model.
Yeah, this one was an unfortunate loss for me; I just want a List[T] with unique, sorted elements.
I think this is a way you can achieve this in v2:
from typing import Annotated
from pydantic import BaseModel, AfterValidator, PlainSerializer
def require_sorted_unique(v):
if v != sorted(set(v)):
raise ValueError('not sorted unique')
return v
RequireSortUniqueDuringValidation = AfterValidator(require_sorted_unique)
SortUniqueDuringValidation = AfterValidator(lambda v: sorted(set(v)))
SortUniqueDuringSerialization = PlainSerializer(lambda v: sorted(set(v)))
class Model(BaseModel):
x: Annotated[list[int], SortUniqueDuringValidation]
y: Annotated[list[int], SortUniqueDuringSerialization]
z: Annotated[list[int], RequireSortUniqueDuringValidation]
some_ints = [5, 5, 4, 4, 3, 3, 2, 2, 1, 1]
m = Model(x=some_ints, y=some_ints, z=[])
print(m)
# > x=[1, 2, 3, 4, 5] y=[5, 5, 4, 4, 3, 3, 2, 2, 1, 1] z=[]
print(m.model_dump())
# > {'x': [1, 2, 3, 4, 5], 'y': [1, 2, 3, 4, 5], 'z': []}
Model(x=[], y=[], z=[5, 4])
"""
pydantic_core._pydantic_core.ValidationError: 1 validation error for Model
z
Value error, not sorted unique [type=value_error, input_value=[5, 4], input_type=list]
For further information visit https://errors.pydantic.dev/2.0.2/v/value_error
"""
Does that work for you?
@dmontagu the simplicity of the conlist implementation vs your solution is difficult to beat. I have some experience in pydantic now and it still took me too long to understand what your solution actually entails. I am definitely in favour of getting the conlist working again as it was.
Here is a real use case. MongoDB + Pydantic.
class Document:
unique_elements: set[int]
Using set will give an error from pymongo:
bson.errors.InvalidDocument: cannot encode object: set(), of type: <class 'set'>
It would be convenient to have something like
class Document:
unique_elements: Annotated[list[int, ListConstraints(unique_items=True)]
Of course I can add a validator (or to use a workaround above https://github.com/pydantic/pydantic-core/issues/296#issuecomment-1622283463)
class Document:
unique_elements: list[int]
@model_validator
# validate here that `unique_elements` has only unique elements...
but restricting a list/tuple seems to be within the scope of Pydantic.
At the worst case, https://github.com/pydantic/pydantic-core/issues/296#issuecomment-1622283463 can be added to the package but with a warning in the documentation that this is a potentially slow operation (pure Python).
In my case I use unique_items for list[], as I need to keep the same order of items as it was provided. And unique_items was a perfect option for my case.
Now,
- if I use
set[]the items are in random order. - if I use
list[]there is nouniqueitems in the schema and no validation.
Are there any critical issues with this option? I can try to create an PR to return it back.
See https://github.com/pydantic/pydantic-core/pull/820#issuecomment-1670475909, I think this solves all of the requirements and will be about as fast as it can be.
Before we had unique=True, but now we need several lines of code. If this is absolutely necessary, can we at least have some syntactic sugar for it and bring unique back that way?
Note that set is unordered. If we really have to use set, we will have to use an implementation that is ordered and outside the STD.
Edit: I changed my mind, we should follow @adriangb's example.
@adriangb, do you think we should leave this open?
I think this can be closed as a WontFix given the alternatives offered.
Closing, if someone really wants this built into pydantic-core, we would of course be willing to review a PR.
BTW, https://github.com/pydantic/pydantic-core/pull/820#issuecomment-1670475909 won't work in the case of inputs which are not hashable, so a full solution would have to compare equality of every pair of items in the list, which I think would have performance of O(n^2) (I think) — more reasons not to implement this in core.