marshmallow icon indicating copy to clipboard operation
marshmallow copied to clipboard

add a Unique validator

Open bonastreyair opened this issue 3 years ago • 10 comments

At @midokura we have encountered that marshmallow did not have any duplicate validator. We are sharing the implementation we are currently using so that others could benefit from it too.

This new validator is designed to work only with iterables.

In the case of having a value which is a list of objects you can check if one is repeated using an attribute of the object to check if it is repeated or not.

pytests have also been added.

Hope it can be reviewed and at some point included in a future release.

bonastreyair avatar Apr 15 '21 14:04 bonastreyair

I implemented something similar in a project of mine, but never clean enough to share it here.

I had issues, and I think this implem has the same issues, if the iterable contains items that are not hashable because of the set being used.

I ended up using a list rather than a set for the comparison if items are not hashable. I added a hashable argument to the validator to use sets when possible, for performance, but a list would probably be fine in all cases.

This is old and I didn't investigate it much more than this today. Not 100% sure it is relevant. Just bringing this up to avoid letting the issue slip through.

lafrech avatar Apr 15 '21 16:04 lafrech

items that are not hashable

I am tempted to suggest ignoring that use case, but it wouldn't take much code to fall back to the list based membership check when a TypeError occurs.

def __call__(self, value):
    ids = [
        getattr(item, self.attribute) if self.attribute else item
        for item in value
    ]
    try:
        duplicate = self._duplicate_hash(ids)
    except TypeError:
        duplicate = self._duplicate_equal(ids)
    if duplicate:
        raise ValidationError(self._format_error(duplicate))
Side rant

I would be nice if the type system inherited a generic Mutable interface that exposed the add method on lists so that the iterable type could simply be configured by the user.

deckar01 avatar Apr 16 '21 18:04 deckar01

@deckar01 and @lafrech check out your suggested changes

bonastreyair avatar Apr 19 '21 16:04 bonastreyair

Thanks.

This leaves open @deckar01's comment above about get_value. Also, validation happens on load and we don't generally have objects after a load but rather builtin structures (dict, list), so I don't see the point of attribute access.

The docstring and error messages could be reworded for better english but this can be done afterwards.

The non-hashable case is not tested.

We generally don't send back faulty values in error messages. Just the fact that the value is invalid.

Otherwise, I agree on the principle.

lafrech avatar Apr 22 '21 21:04 lafrech

  • Added utils.get_value()
  • Added hashable tests
  • Improved typing

Should we remove the value that is not unique in the error then?

Regarding the use of attribute: Imagine we have a marshmallow schema with a field which is a List of another marshmallow object. Then we may encounter that the elements in that List are objects, and that's why the use of attributes might help to check if the elements on the list are unique.

That solves the case when you have this List with many different objects and an id is assigned to each one of them. If the id is not unique (even if the object is different) the the validation will fail.

bonastreyair avatar Apr 23 '21 08:04 bonastreyair

I understand the need to check the uniticy of a subitem (I managed the same case in an app of mine).

What I'm saying is that unless you use a post-load decorator to instantiate an object, you're dealing with dicts at this stage so getattr shouldn't work. Or am I missing something?

lafrech avatar Apr 23 '21 08:04 lafrech

Actually that's a question I had when dealing with this.

If I set a field of a List[<marshmallow_object>] that uses the proposed Unique validator, the variable value in the function def __call__(self, value) already receives the <marshmallow_object> itself. That's why I set the attribute.

Example:

@dataclass
class Bar:
    custom_id: int


@dataclass
class Foo:
    bar_list: List[Bar] = field(
        metadata={
            "validate": Unique(attribute="custom_id")
        }
    )


foo = {"bar_list": [{"custom_id": 1}, {"custom_id": 1}]}

FooSchema = marshmallow_dataclass.class_schema(Foo)()
FooSchema.load(foo)

This triggers the ValidationError exception:

marshmallow.exceptions.ValidationError: {'bar_list': ['Found a duplicate object attribute (custom_id): 1.']}

Debugging the variables when Unique function gets triggered I see that value already has a List[Bar] and not a List[Dict].

value = [Bar(custom_id=1), Bar(custom_id=1)]

Let me know if I'm also missing something... may it be because I'm using marshmallow_dataclass library to generate the Schemas?

bonastreyair avatar Apr 23 '21 11:04 bonastreyair

may it be because I'm using marshmallow_dataclass library to generate the Schemas?

marshmallow-dataclass does instantiation on load so yes, that would be the reason you have an object at this stage and not a dict.

A test should be added to validate unique "attribute" using dicts, not objects.

lafrech avatar Apr 23 '21 14:04 lafrech

@lafrech dict unittests have been added, also added a nested attribute unittests.

Should object tests be removed?

bonastreyair avatar Apr 30 '21 13:04 bonastreyair

@deckar01 @lafrech is there anything missing before it can be merged? I've just merged from dev to add latest changes

bonastreyair avatar May 13 '21 14:05 bonastreyair