intersection_examples icon indicating copy to clipboard operation
intersection_examples copied to clipboard

Resolution of same name attributes on intersection types.

Open baluyotraf opened this issue 8 months ago • 24 comments

Sorry if this has been discussed before.

Currently, the draft PEP says that the Intersection[...] element order for the intersection does not matter, and create overloads for the intersection types. However the file under notes.txt says that the intersection of class types should follow the MRO, which relies on the order the classes are provided.

In general, this means that if the order and @overloading of the attributes is how the specification should work, the way the typing works is different from multiple inheritance actually works in Python. Given the example below:

class A:
  def __init__(self, a: int):
    self.a = a

class B:
  def __init__(self, a: str, b: int):
    self.a = a
    self.b = b

class AB(A, B):
   # Will use A's __init__ via MRO 

T_AB = A & B # Very different assumptions to the actual multiple inheritance case
# Equivalent to:
class T_AB:
  @overload
  def __init__(self, a: int)
    ...
  
  @overload
  def __init__(self, a: str, b: int)
    ...

Means that the type checker will make assumptions on code interfaces that won't exist during run time as using the second overload will result in an error.

baluyotraf avatar Apr 06 '25 12:04 baluyotraf

Yes, intersections are different from multiple inheritance. Multiple inheritance picks an implementation based on mro, intersections declare compatibility with both uses. This is a case that needs to be detected as incompatible. There's an additional obscuring problem here in that constructors are currently special cased in subclassing. If this were not the case, your multiple inheritance example should error. (a real world case based on it almost certainly does, because B's attributes are missing from AB, and attempts to use it as AB("a", 1), compatibly with B break)

mikeshardmind avatar Apr 06 '25 13:04 mikeshardmind

I agree with your point that multiple inheritance and intersection types are indeed different, from a theoretical perspective.

Which makes me think if we need a full intersection types in Python or would something like a SuperType[...] or MRO[...] should be enough to cover most use cases, as intersections are mostly defined in practice using inheritance.

baluyotraf avatar Apr 06 '25 13:04 baluyotraf

Intersections are useful outside of multiple inheritance, we have motivating examples involving protocol composition (structural subtyping) and class transforming decorators. ((T) -> (T & BehaviorAddedProtocol)) If this was only about inheritance, I agree that we probably wouldn't need full intersections, making type Generic over a typevartuple of types in resolution order would work.

mikeshardmind avatar Apr 06 '25 13:04 mikeshardmind

That's fair actually. And that seems to be more aligned with what's written in the notes.txt which is to follow the MRO for real classes, and actual intersection for Protocols which sort of relates on how the inheritance syntax resolves from both implementation (real classes) and typing perspective (Protocol). This means that Intersection[A, B, C, D, ...] resolves to how class SuperType(A, B, C, D, ...) resolves.

Though as you said, this is actually not how intersection works in theory.

Though going back, I'm not sure which one to actually follow, since the notes.txt and the specification.rst seems to have conflicting inputs.

baluyotraf avatar Apr 06 '25 14:04 baluyotraf

@baluyotraf If you're interested in picking this up again, I might recommend reading this version of the spec too: https://github.com/CarliJoy/intersection_examples/blob/intersection-pep-new/docs/specification.rst

We all went back and forth on these quite a bit, and I drafted this up as something that expresses a "happy medium", where we have something that is essentially MRO based, but still allows for the use cases of TypedDict, Protocols and class decorators (some of the points of discussions). Couldn't reach agreement on this, but it might provide a jumping off point for a way forward.

You can also see discussions of the PR here: https://github.com/CarliJoy/intersection_examples/pull/49

mark-todd avatar Apr 09 '25 18:04 mark-todd

From what I understand, a inheritance based approach was also discussed in #38 and the current issue is how to resolve it with Any

baluyotraf avatar Apr 10 '25 21:04 baluyotraf

Sorry, there's a lot of history of discussion on this. The approach in #38 doesn't work.

The short version: Current best is back to fully implementing set-theoretic intersections, where:

T & Any is not a reducible expression T & object reduces to T T & Never reduces to Never

Resolution for overlapping attributes is to intersect the types of the attributes for existing members.

The open questions on this are much fewer now, and the majority of hangups actually involve other poorly defined parts of how typing should handle the python data model. In particular, collections.abc.Hashable has the potential to be a very large nuisance with user denotable intersections (intersections are already in the type system via TypeIs, but they are not user-denotable), because it is removable in subclasses. If this is solved in a way existing contributors to the typing ecosystem are satisfied with, I'd be willing to start drafting a formal proposal for user denotable intersections. The ground work on this blocker was taken over to discourse, though discussion on it left me with the impression that people had some issues with the method I had in mind.

mikeshardmind avatar Apr 10 '25 22:04 mikeshardmind

Thanks for giving me a quick summary of what's been happening so far.

The proposal on Hashable looks good to me as well. If I set __hash__ = None on a class, I'm assuming that the consumer of that class treats that as an unhashable. Other people can of course override the hashing behavior, but as the writer of the base class, the explicit setting of __hash__ = None means that I can add other internal workings inside the object that won't hash properly, making the inheritance of my code unsafe.

I think the reason why removing hashability should not be done is relatively obvious from an inheritance perspective.

Other things on intersections:

  • Required/NotRequired: I guess this just resolves to Required[A & B]? The constructs represent the existence of the field, rather than the type anyway.
  • Annotated: I think this just resolved to Annotated[A & B, <Annotations from A and B>]

I think it would be nice to have a place where the current status has been summarized. I know you guys have done a lot of work and thinking in this matter already. I can put some work to go through everything. I would imagine most of them are in the issues in this repository.

If someone has something that I can start with that would be nice, otherwise start with the discussions I find interesting and start from there.

baluyotraf avatar Apr 11 '25 09:04 baluyotraf

Most up to date discussion is on this issue: https://github.com/CarliJoy/intersection_examples/issues/48 - so I would probably start there. I wouldn't try reading all of the old issues tbh as there's just an extremely large amount of discussion (most of which is now outdated or resolved).

The ordered layers spec was just only way I could find to describe a consistent interpretation at the time - I think based on what Mike's saying we're probably at the point an unordered one would be possible which would be cool!

That said there are sections in the ordered layers spec that mention previous discussions, so it's probably a good way to get the spark notes of the hot topics (without trawling hundreds of comments around Any & :) ). Also each of the headings essentially addresses each of the major issues - it might be now that we have another way to resolve each of these, but you can see at least see the issues we were trying to solve here.

From what I understand, a inheritance based approach was also discussed in https://github.com/CarliJoy/intersection_examples/issues/38 and the current issue is how to resolve it with Any

The Ordered layers version basically came out of the discussion on 38 - I adapted it from one Mike drafted around that discussion. It does present a solution to Any & etc but it may not be the preferable one now.

https://github.com/CarliJoy/intersection_examples/blob/intersection-pep-new/docs/specification.rst

mark-todd avatar Apr 11 '25 13:04 mark-todd

Also re Required/NotRequired I think these weren't in Python last time around so may be new discussion.

Re Annotated I think we might have discussed this before but only briefly so might be worthy of a new issue

mark-todd avatar Apr 11 '25 13:04 mark-todd

Required/NotRequired: I guess this just resolves to Required[A & B]? The constructs represent the existence of the field, rather than the type anyway.

We don't have rules for type qualifiers yet. There are a few that are obvious-ish like Final and ClassVar that should just propagate.

Required[A] & NotRequired[B] isn't possible in my interpretation no matter what A and B are due to the nature of structural subtyping, and NotRequired saying it's possible to remove that key/value pair, where Required says it must remain.

mikeshardmind avatar Apr 11 '25 14:04 mikeshardmind

`Required[A] & NotRequired[B] isn't possible in my interpretation no matter what A and B are due to the nature of structural subtyping, and NotRequired saying it's possible to remove that key/value pair, where Required says it must remain.

I would suggest that an example that might be worth considering is something like this:

class A(TypedDict):
    x: Required[int]
    y: str

class B(TypedDict):
    x: NotRequired[int]
    z: str
a: A & B

E.g. in the above, should we ban A & B essentially because the attribute x can't exist? I would think maybe giving x the type Never could work?

mark-todd avatar Apr 11 '25 17:04 mark-todd

A & B needs to be detected as invalid here in some way, otherwise this "looks" fine to the user until they go to access x.

mikeshardmind avatar Apr 11 '25 17:04 mikeshardmind

I can start drafting a rougher form of what should become the formal proposal as it would exist under the assumption the hashability question will have a usable resolution (doesn't need to be the exact resolution linked) which should help serve as a point to refocus rather than ask people to dredge up all of the existing discussion.

mikeshardmind avatar Apr 11 '25 17:04 mikeshardmind

A & B needs to be detected as invalid here in some way, otherwise this "looks" fine to the user until they go to access x.

I suppose I'm thinking about the use case of a user with two large typed dicts let's say, and they want to merge them. Thinking perhaps x -> Never would be more ergonomic? As essentially it would force them to redefine the entire class if the merge is banned.

mark-todd avatar Apr 11 '25 17:04 mark-todd

I can start drafting a rougher form of what should become the formal proposal as it would exist under the assumption the hashability question will have a usable resolution (doesn't need to be the exact resolution linked) which should help serve as a point to refocus rather than ask people to dredge up all of the existing discussion.

Sounds good to me!

mark-todd avatar Apr 11 '25 17:04 mark-todd

I suppose I'm thinking about the use case of a user with two large typed dicts let's say, and they want to merge them. Thinking perhaps x -> Never would be more ergonomic? As essentially it would force them to redefine the entire class if the merge is banned.

Yeah, I'm aware of that problem, but I think we need to start with the sound version and leave this as a known place for ergonomic improvement, such as a way to select specific fields to create a new structural type with from an existing type (see Typescript's pick) to help users there.

mikeshardmind avatar Apr 11 '25 17:04 mikeshardmind

Yeah, I'm aware of that problem, but I think we need to start with the sound version and leave this as a known place for ergonomic improvement, such as a way to select specific fields to create a new structural type with from an existing type (see Typescript's pick) to help users there.

Yeah I agree, ultimately I think getting "v1" out is probably the most important thing - once it becomes established adding extra features I feel like might be easier to get through. We should probably make a section for "future work" though to keep to track of these as we think of them.

mark-todd avatar Apr 11 '25 17:04 mark-todd

Thinking about it, the hash issue is actually bigger, since the docs basically say you can set any magic method to None, with __hash__ only special cased in the sense that it's implicitly set when __eq__ is defined. I guess for things like __iter__ it's fine to just error out, __hash__ is mainly problematic since users actually add/remove support in subclasses…

TeamSpen210 avatar Apr 11 '25 20:04 TeamSpen210

Re hashing, I think __hash__ overrides have to maintain the return type from the supertype or else it's an LSP violation? Notwithstanding that I think basically the rule is:

[1,2,3].__hash__ is None # True

[1,2,3].__fred__ # Other methods that are not implemented return an error

The issue is that __hash__ on classes where it's implemented is actually an attribute of None, whereas where it is implemented it's a method.

I would suggest we special case it and say that if:

class B: # Has a hash method by default
    pass

def test() -> list[int] & B:
    class C(B, list[int]):
        pass
    return C()
 
x = test().__hash__ # Would have the signature from B. In other words the Hashable type wins the intersection

mark-todd avatar Apr 12 '25 13:04 mark-todd

I don't intend to special case it. C there doesn't have a hash method. __hash__ isn't looked up by MRO, and has special behavior associated with it.

>>> class B:  pass
...
>>> class C(B, list):  pass
...
>>> C.__hash__
>>> type(C.__hash__)
<class 'NoneType'>
>>> B.__hash__
<slot wrapper '__hash__' of 'object' objects>
>>>

This is part of it being the only remaining blocker in my view, this particular protocol is pretty important for things like Sets and Dicts, and user-denotability can express invalid intersections in ways that create type errors at runtime.

>>> {B(), C(),}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'C'

It's a pre-existing problem in the type system, but it's very likely to be a real problem with intersections. Some of the sample use cases run into it.

I hope to have a draft state proposal ready by the end of next week, I'm going to take extra time to review everything and make sure all of the edge cases we came up with over this very long discussion have a workable answer, even if that answer is "For now, this is what we can do, but this should be improved as we gain tools for expression". As you said, getting an initial version that people can use is important.

mikeshardmind avatar Apr 12 '25 16:04 mikeshardmind

It's interesting that the __hash__ method is not looked up by the MRO. I can't find anything about it in the Python's data model documentation but is hashability intended to be a separate property or is this missing it out?

I guess my question is, if __hash__ does not follow the standard inheritance rules, should it be assumed that the child of a Hashable class is also one? I think yes is the answer from a theoretical perspective, but it looks like the Python behavior has deviations from it.

baluyotraf avatar Apr 15 '25 17:04 baluyotraf

See https://docs.python.org/3/reference/datamodel.html#object.hash

A class that overrides __eq__() and does not define __hash__() will have its __hash__() implicitly set to None

But object has a pre-defined __hash__()

So MRO doesn't not really work here properly.

CarliJoy avatar Apr 15 '25 23:04 CarliJoy

I do understand how __eq__ and __hash__ interacts. My question is about the __hash__ not being resolved by the MRO.

baluyotraf avatar Apr 16 '25 01:04 baluyotraf