typeshed icon indicating copy to clipboard operation
typeshed copied to clipboard

Annotations for `set.__sub__` and related methods are inconsistent

Open LeeeeT opened this issue 3 years ago • 17 comments

According to the documentation s1 - s2 returns the set of all elements that are in s1 but not in s2. So what's the point of constraining s2 elements type? I believe the correct annotation for the argument of set.__sub__ would be AbstractSet[object] rather than AbstractSet[T | None].

https://github.com/python/typeshed/blob/cf9bdc2d9861cb61e76b512988d19afe232a958f/stdlib/builtins.pyi#L1100

LeeeeT avatar Oct 27 '22 16:10 LeeeeT

You want an error for set[int] - set[str]. Unfortunately there's no way to say "any type that overlaps with _T", and _T | None is an approximation of that to avoid errors in the common case set[int] - set[int | None].

Akuli avatar Oct 27 '22 20:10 Akuli

@Akuli why do you think there should be an error for set[int] - set[str]? The resulting set can only contain ints, regardless of the elements type of the second set.

LeeeeT avatar Oct 28 '22 07:10 LeeeeT

A user shipped code that did set[bytes] - set[str] to production, and of course, it didn't work as expected (#1840). There is no use case for set[int] - set[str]: if your code does it, that's very likely not intentional.

Akuli avatar Oct 28 '22 12:10 Akuli

@Akuli that makes sense. But what about __isub__, __and__, __iand__, intersection, difference, intersection_update and difference_update?

https://github.com/python/typeshed/blob/cf9bdc2d9861cb61e76b512988d19afe232a958f/stdlib/builtins.pyi#L1101

Shouldn't they take AbstractSet[_T | None] (for dunder methods) and Iterable[_T | None] (for others) too? I don't see any logic.

I'm also wondering where this _T | None came from, and why isn't it just _T.

LeeeeT avatar Oct 28 '22 14:10 LeeeeT

__isub__

Good point. In fact, the code from #1840 no longer errors as it should. Apparently this was broken in #7161. @AlexWaygood, what are your thoughts on this?

_T | None

You often want to do set[int] - set[int | None]. The result is a set of integers. There's nothing wrong about it: because it is possible for a set[int | None] to contain integers, it isn't a no-op. Ideally we would say

set[AnythingThatCanBe[_T]]

but that isn't possible, and _T | None is the best approximation of that we have.

I think we should use the _T | None trick for .discard() too. I created #7121 to discuss it but never got around to fixing it.

Akuli avatar Oct 28 '22 18:10 Akuli

@Akuli you mentioned that when we do set[TypeA] - set[TypeB], we want TypeB to be AnythingThatCanBe[TypeA]. But isn't it just any type? We can always create a subtype of TypeB that is also a subtype of TypeA using multiple inheritance.

Consider the following example:

class TypeA: pass

class TypeB: pass

class TypeAB(TypeA, TypeB): pass

ab = TypeAB()

a: set[TypeA] = {TypeA(), ab}
b: set[TypeB] = {TypeB(), ab}

print(a - b)  # {<TypeA object>}
# (isn't no-op)

Even if TypeB is not a subtype of TypeA, set[TypeB] can still contain instances of TypeA. That's not the case for int and str since we can't create a subclass of both of them, but classes with __slots__ are just an exception. In general case, variable of arbitrary type TypeA can be set to an instance of any other arbitrary type TypeB (to an instance of intersection of these two types, to be more precise).

Currently, type checkers complain about the last line, and I believe they shouldn't. What do you think?

LeeeeT avatar Oct 28 '22 19:10 LeeeeT

An intersection instance of str and int cannot exist, for example, because of the memory layout of int and str objects in CPython. They store different kinds of data in the beginning of the structure, and because objects are represented as pointers to the beginning, there's no way to have something contain the datas needed for both.

Here's what the interpreter says if you try:

>>> class Foo(int, str): pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: multiple bases have instance lay-out conflict

Edit: Nevermind, you mentioned this already. I get your point, it would be practically almost any type. Maybe we should just stick to _T | None.

Akuli avatar Oct 28 '22 20:10 Akuli

My proposal is to make __sub__ and related methods accept AbstractSet[object]. Passing a set of arbitrary type to these methods would never cause an error at runtime. So I think, it's safe, and it wouldn't affect the existing code because it would make type checkers more lenient. Preventing a user from doing set[TypeA] - set[TypeB], where TypeA and TypeB do not overlap, is also incorrect, because it's not always no-op as I've shown in the above example.

LeeeeT avatar Oct 28 '22 20:10 LeeeeT

I would prefer "practicality beats purity". Even though we usually want to prevent false positives, we also don't want people to ship broken code to production, which IMO is more important than rarely getting extra warnings during development.

Akuli avatar Oct 28 '22 20:10 Akuli

What broken code are you talking about? The code in #1840 may be considered broken because it's no-op. But it's no-op only because we can't create a subclass of both str and bytes. For slotless classes this is not necessarily no-op, and there's no point in disallowing this.

LeeeeT avatar Oct 28 '22 21:10 LeeeeT

By broken code, I mean code that doesn't work as intended, in this case set[Foo] - set[Bar] where the author doesn't intend to deal with objects that are simultaneously Foo and Bar. This is a real problem that users are facing. If you have some production / non-toy-example code that fails to type check because of this, I would be interested to see it.

Akuli avatar Oct 28 '22 21:10 Akuli

When you write set[Foo] or set[Bar] you have to keep in mind that these sets can contain objects that are simultaneously Foo and Bar. The fact that you can't annotate an object that is Foo but not Bar is a different problem. I'd also be glad to see real examples where set[Foo] - set[Bar] could be dangerous :)

LeeeeT avatar Oct 29 '22 11:10 LeeeeT

Another example:

my_ints = {1, 2, 3}
my_objs = {1, None, 'a', 2}

print(my_ints - my_objs)  # {3}

I'm getting an error for the last line. This is pretty common scenario though. How am I supposed to do it differently?

I can write my_ints.difference(my_objs) instead, but this is due to inconsistency between __sub__ and difference (which do exactly the same), and when the inconsistency is fixed, difference won't be an option either.

LeeeeT avatar Oct 30 '22 11:10 LeeeeT

@AlexWaygood What are your thoughts on this issue? Your PR #7161 made the methods inconsistent, and while you argued that Any is better (as in this issue), you apparently didn't consider the "no-op in production" argument from #1840.

Akuli avatar Nov 03 '22 15:11 Akuli

I'll try to take a look at this today, sorry for the radio silence!

AlexWaygood avatar Nov 04 '22 10:11 AlexWaygood

Here's an annoying consequence of this: pyright raises an error if we consider set[LiteralString] - set[str] (pyright-playground):

COLS = ("TIME", "VALUE")
"""Needed columns."""

csv_data: list[list[str]] = [[]]
header = csv_data[0]

if missing_cols := set(COLS) - set(header):  # reportOperatorIssue
    raise ValueError(f"Missing columns: {missing_cols}")

Operator "-" not supported for types "set[Literal['TIME', 'VALUE']]" and "set[str]"

I would agree that in most cases set[T] - set[S] is probably an error. Given the example above, maybe a good compromise is to actually allow the case when S is a supertype of T? This would also cover the requested set[T] - set[T | None] scenario, but not the case when T and S are incomparable, such as set[int] - set[str].

randolf-scholz avatar Mar 08 '24 14:03 randolf-scholz

Not sure if that can be implemented at the moment, I think it may require generic bounds to type-vars (https://github.com/python/typing/issues/548, https://github.com/python/mypy/issues/8278)

class MySet[T](MutableSet[T]):
    @overload  # case A≥B
    def __sub__(self, other: AbstractSet[T]) -> Self: ...
    @overload  # case A≤B
    def __sub__[B, A: B](self: MySet[A], other: AbstractSet[B]) -> MySet[A]: ...

randolf-scholz avatar Mar 08 '24 14:03 randolf-scholz