pydantic-core icon indicating copy to clipboard operation
pydantic-core copied to clipboard

Add `unique` constraint to list?

Open samuelcolvin opened this issue 3 years ago • 14 comments

I'm not sure about this, it would need to have minimal or no impact on performance when switched off.

Some questions if we are to support it:

  • Should we implement unique only on the list validator or on tuple, set, frozenset and general iterables as we build the vec?
  • Should we implement a unique check on set and frozenset validators - this would have minimal performance impact as it would involve simply checking that the length of the input before and after creating the set, but sets loose order so won't be sufficient for some scenarios
  • what's the most performant way to implement this in rust??? I guess for list[int] and a few other types, we could do something pretty performant with an AHashSet but a general case will require some fairly slow python.

samuelcolvin avatar Oct 17 '22 15:10 samuelcolvin

I may be confused but, why set and frozenset? Those collections doesn't allow for duplicate values.

About the unique implementation, would it be compared by equality or by identity?

odiseo0 avatar Oct 20 '22 16:10 odiseo0

equality.

I may be confused but, why set and frozenset? Those collections doesn't allow for duplicate values.

Well, the question is whether [1, 2, 2, 3] as input to a set should raise an error - I think that's a useful option, but doesn't solve all problems since sets loose order.

samuelcolvin avatar Oct 20 '22 16:10 samuelcolvin

Ok, I understood. I think it's okay and is up to the developer whether to use it or not.

I'll try to do an implementation as a draft or as a reference.

odiseo0 avatar Oct 20 '22 17:10 odiseo0

Great, thank you.

samuelcolvin avatar Oct 20 '22 17:10 samuelcolvin

Ok, I understood. I think it's okay and is up to the developer whether to use it or not.

I'll try to do an implementation as a draft or as a reference.

Q: You are doing this in python? Or rust? (Asking out of curiosity >> I want to learn how this stuff works.)

ybressler avatar Dec 30 '22 20:12 ybressler

this needs to be in rust since the list validator is all written in rust, see https://github.com/pydantic/pydantic-core/blob/9e54f7833a0d88daf9696430756007f0f7f8dc0c/src/validators/list.rs#L96

samuelcolvin avatar Dec 30 '22 20:12 samuelcolvin

After the removal of unique_items in v2 and this issue there is no way in v2 to specify a unique array when serializing the model.

sasanjac avatar May 17 '23 13:05 sasanjac

After the removal of unique_items in v2 and this issue there is no way in v2 to specify a unique array when serializing the model.

Yeah, this one was an unfortunate loss for me; I just want a List[T] with unique, sorted elements.

PaarthShah avatar Jul 01 '23 01:07 PaarthShah

I think this is a way you can achieve this in v2:

from typing import Annotated

from pydantic import BaseModel, AfterValidator, PlainSerializer


def require_sorted_unique(v):
    if v != sorted(set(v)):
        raise ValueError('not sorted unique')
    return v


RequireSortUniqueDuringValidation = AfterValidator(require_sorted_unique)
SortUniqueDuringValidation = AfterValidator(lambda v: sorted(set(v)))
SortUniqueDuringSerialization = PlainSerializer(lambda v: sorted(set(v)))


class Model(BaseModel):
    x: Annotated[list[int], SortUniqueDuringValidation]
    y: Annotated[list[int], SortUniqueDuringSerialization]
    z: Annotated[list[int], RequireSortUniqueDuringValidation]


some_ints = [5, 5, 4, 4, 3, 3, 2, 2, 1, 1]
m = Model(x=some_ints, y=some_ints, z=[])
print(m)
# > x=[1, 2, 3, 4, 5] y=[5, 5, 4, 4, 3, 3, 2, 2, 1, 1] z=[]
print(m.model_dump())
# > {'x': [1, 2, 3, 4, 5], 'y': [1, 2, 3, 4, 5], 'z': []}

Model(x=[], y=[], z=[5, 4])
"""
pydantic_core._pydantic_core.ValidationError: 1 validation error for Model
z
  Value error, not sorted unique [type=value_error, input_value=[5, 4], input_type=list]
    For further information visit https://errors.pydantic.dev/2.0.2/v/value_error
"""

Does that work for you?

dmontagu avatar Jul 05 '23 18:07 dmontagu

@dmontagu the simplicity of the conlist implementation vs your solution is difficult to beat. I have some experience in pydantic now and it still took me too long to understand what your solution actually entails. I am definitely in favour of getting the conlist working again as it was.

orfisko avatar Jul 14 '23 14:07 orfisko

Here is a real use case. MongoDB + Pydantic.

class Document:
    unique_elements: set[int]

Using set will give an error from pymongo:

bson.errors.InvalidDocument: cannot encode object: set(), of type: <class 'set'>

It would be convenient to have something like

class Document:
    unique_elements: Annotated[list[int, ListConstraints(unique_items=True)]

Of course I can add a validator (or to use a workaround above https://github.com/pydantic/pydantic-core/issues/296#issuecomment-1622283463)

class Document:
   unique_elements: list[int]

   @model_validator
   # validate here that `unique_elements` has only unique elements...

but restricting a list/tuple seems to be within the scope of Pydantic.

At the worst case, https://github.com/pydantic/pydantic-core/issues/296#issuecomment-1622283463 can be added to the package but with a warning in the documentation that this is a potentially slow operation (pure Python).

simon-liebehenschel avatar Jul 27 '23 19:07 simon-liebehenschel

In my case I use unique_items for list[], as I need to keep the same order of items as it was provided. And unique_items was a perfect option for my case.

Now,

  • if I use set[] the items are in random order.
  • if I use list[] there is no unique items in the schema and no validation.

Are there any critical issues with this option? I can try to create an PR to return it back.

grabov avatar Aug 08 '23 22:08 grabov

See https://github.com/pydantic/pydantic-core/pull/820#issuecomment-1670475909, I think this solves all of the requirements and will be about as fast as it can be.

adriangb avatar Aug 09 '23 00:08 adriangb

Before we had unique=True, but now we need several lines of code. If this is absolutely necessary, can we at least have some syntactic sugar for it and bring unique back that way?

Note that set is unordered. If we really have to use set, we will have to use an implementation that is ordered and outside the STD.

Edit: I changed my mind, we should follow @adriangb's example.

caniko avatar Nov 14 '23 08:11 caniko

@adriangb, do you think we should leave this open?

sydney-runkle avatar Aug 17 '24 02:08 sydney-runkle

I think this can be closed as a WontFix given the alternatives offered.

adriangb avatar Aug 17 '24 06:08 adriangb

Closing, if someone really wants this built into pydantic-core, we would of course be willing to review a PR.

BTW, https://github.com/pydantic/pydantic-core/pull/820#issuecomment-1670475909 won't work in the case of inputs which are not hashable, so a full solution would have to compare equality of every pair of items in the list, which I think would have performance of O(n^2) (I think) — more reasons not to implement this in core.

samuelcolvin avatar Aug 17 '24 10:08 samuelcolvin