marshmallow icon indicating copy to clipboard operation
marshmallow copied to clipboard

Use OrderedSet as default set_class

Open lafrech opened this issue 2 years ago • 7 comments

Use OrderedSet as default set_class.

Closes #1744.

~We could do that already but I'd rather wait until we drop Python 3.6 so that the change is easier to document: all users will be affected.~

No need to change ordered at this stage. In marshmallow 4 we can do the same with dict_class and let users override if they really want an OrderedDict.

This change generates mypy issues. I'd be happy to get help on this.

lafrech avatar Oct 27 '21 12:10 lafrech

I can't seem to find a way to silence mypy, here. Help welcome.

lafrech avatar Mar 06 '22 19:03 lafrech

Mypy error for those curious:

src/marshmallow/schema.py:919: error: Incompatible types in assignment (expression has type "OrderedSet", variable has type "Optional[Union[Sequence[str], Set[str]]]")
src/marshmallow/schema.py:924: error: Incompatible types in assignment (expression has type "OrderedSet", variable has type "Set[Union[Any, str]]")
src/marshmallow/schema.py:972: error: Incompatible types in assignment (expression has type "AbstractSet[Any]", variable has type "OrderedSet")

The first error is because self.only is configured to be StrSequenceOrSet | set_class | None so when you assign an instance variable to it, it's no longer valid as that type, as StrSequenceOrSet is defined as StrSequenceOrSet = typing.Union[typing.Sequence[str], typing.Set[str]] which doesn't match the MutableSet being subclassed (set vs. mutable set being the key error here).

Updating StrSequenceOrSet to the following fixes the first issue:

StrSequenceOrSet = typing.Union[
    typing.Sequence[str], typing.Set[str], typing.MutableSet[str]
]

The second one is similar in nature, self.exclude is inferred as Set[Any] because of:

        self.exclude = set(self.opts.exclude) | set(exclude)

To address this, you need to allow mutable sets because that's what the OrderedSet is subclassing.

        self.exclude: typing.Set[typing.Any] | typing.MutableSet[typing.Any] = set(
            self.opts.exclude
        ) | set(exclude)

Finally, to address the last issue you need to tell Python that the field_names is ok being less specific (an AbstractSet rather than either a Mutable or immutable set:

            field_names: typing.AbstractSet[typing.Any] = self.set_class(self.only)

With that, you'll pass all tests. Though, the OrderedSet is still only partially defined as it's missing the generic parameters and whatnot

kkirsche avatar Aug 23 '22 12:08 kkirsche

I should note, per https://docs.python.org/3/library/typing.html typing.AbstractSet is preferred over typing.Set when possible:

To annotate arguments it is preferred to use an abstract collection type such as AbstractSet.

This is because AbstractSet is the base collection object and doesn't limit you to a specific type of set. If you need more specific behaviors though, that's when reaching for MutableSet or Set can be helpful. This aligns with how it's a best practice not to accept things like List[str] or Dict[str, str] as parameters, instead preferring the protocols like Collection[str] and Mapping[str, str] which allow for various types to satisfy their constraints

kkirsche avatar Aug 23 '22 12:08 kkirsche

https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes can help understand the hierarchy of base collection types, https://docs.python.org/3/library/typing.html#typing.AbstractSet recommends collections.abc.Set as of 3.9, to connect that back to it

EDIT: connected how AbstractSet relates here

kkirsche avatar Aug 23 '22 12:08 kkirsche

This is a rough re-write of OrderedSet with type information. I didn't make a PR as I'm not confident enough in how map and end are being used to ensure that the types are correct. The use of lists instead of tuples makes typing a little more difficult as well, because we can't type locations in a list while we can with a tuple.

# OrderedSet
# Copyright (c) 2009 Raymond Hettinger
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation files
# (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge,
# publish, distribute, sublicense, and/or sell copies of the Software,
# and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
#     The above copyright notice and this permission notice shall be
#     included in all copies or substantial portions of the Software.
#
#     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
#     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
#     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
#     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
#     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
#     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
#     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
#     OTHER DEALINGS IN THE SOFTWARE.
from collections.abc import MutableSet
from typing import Optional, Generator, Iterable, TypeVar

_Key = TypeVar("_Key")


class OrderedSet(MutableSet[_Key]):
    def __init__(self, iterable: Optional[Iterable[_Key]] = None) -> None:
        self.end = end = []  # type: ignore
        end += [None, end, end]  # sentinel node for doubly linked list
        self.map = {}  # key --> [key, prev, next] # type: ignore
        if iterable is not None:
            self |= set(iterable)

    def __len__(self) -> int:
        return len(self.map)

    def __contains__(self, key: object) -> bool:
        return key in self.map

    def add(self, key: _Key) -> None:
        if key not in self.map:
            end = self.end
            curr = end[1]
            curr[2] = end[1] = self.map[key] = [key, curr, end]

    def discard(self, key: _Key) -> None:
        if key in self.map:
            key, prev, next = self.map.pop(key)
            prev[2] = next
            next[1] = prev

    def __iter__(self) -> Generator[_Key, None, None]:
        end = self.end
        curr = end[2]
        while curr is not end:
            yield curr[0]
            curr = curr[2]

    def __reversed__(self) -> Generator[_Key, None, None]:
        end = self.end
        curr = end[1]
        while curr is not end:
            yield curr[0]
            curr = curr[1]

    def pop(self, last: bool = True) -> _Key:
        if not self:
            raise KeyError("set is empty")
        key: _Key = self.end[1][0] if last else self.end[2][0]
        self.discard(key)
        return key

    def __repr__(self) -> str:
        if not self:
            return f"{self.__class__.__name__}()"
        return f"{self.__class__.__name__}({list(self)!r})"

    def __eq__(self, other: object) -> bool:
        if isinstance(other, OrderedSet):
            return len(self) == len(other) and list(self) == list(other)
        if isinstance(other, Iterable):
            return set(self) == set(other)
        return False


if __name__ == "__main__":
    s = OrderedSet("abracadaba")
    t = OrderedSet("simsalabim")
    print(s | t)
    print(s & t)
    print(s - t)

Hope this helps :)

kkirsche avatar Aug 23 '22 12:08 kkirsche

Thank you so much for looking into this!

Regarding the OrderedSet type hints, feel free to send a PR even as draft, if only for history. Also, you might want to have a look at ordered-set lib, see how they did id: https://github.com/rspeer/ordered-set/blob/master/ordered_set/init.py (and perhaps this SO question: https://stackoverflow.com/questions/52569356/provide-orderedsetint-like-types-using-stub-file-without-modifying-ordered-set).

lafrech avatar Aug 23 '22 13:08 lafrech

Thanks, submitted a few PR's to her repo to address some incorrect type usage (specifically, incorrect usage of Any). If you're curious, this comment explains the difference between using Any and object as a parameter: https://github.com/python/typeshed/pull/8469#issuecomment-1203116838

kkirsche avatar Aug 23 '22 18:08 kkirsche

I think this is ready to go.

I removed the section about ordered from the quickstart because it is outdated and the option is now only required for a very specific purpose: using an OrderedDict for output, which doesn't need to be documented in a quickstart.

Not sure how and where to document ordered, now.

Does this change warrant an "upgrading to ..." section?

lafrech avatar Dec 30 '22 23:12 lafrech

TODO:

  • [x] Add test showing how to specify OderedDict as a dict class as class attribute.
  • [ ] Deprecate ordered
  • [ ] Deprecate dict_class argument of get_declared_fields

lafrech avatar Jul 02 '23 17:07 lafrech

test_options.py is quite focused on the ordered option and deprecating it would require changes there. Besides, those tests would need to be kept to ensure they don't break until 4.0, and I'm not sure how to silent the deprecation warnings that would be raised.

I don't want to engage in this right now. We can deal with this later. In fact, we can even skip the deprecation phase and remove the parameter and those tests in 4.0.

@sloria I shall merge/release this soonish. Ping me if you want time to review.

lafrech avatar Jul 16 '23 20:07 lafrech