Resolution of same name attributes on intersection types.
Sorry if this has been discussed before.
Currently, the draft PEP says that the Intersection[...] element order for the intersection does not matter, and create overloads for the intersection types. However the file under notes.txt says that the intersection of class types should follow the MRO, which relies on the order the classes are provided.
In general, this means that if the order and @overloading of the attributes is how the specification should work, the way the typing works is different from multiple inheritance actually works in Python. Given the example below:
class A:
def __init__(self, a: int):
self.a = a
class B:
def __init__(self, a: str, b: int):
self.a = a
self.b = b
class AB(A, B):
# Will use A's __init__ via MRO
T_AB = A & B # Very different assumptions to the actual multiple inheritance case
# Equivalent to:
class T_AB:
@overload
def __init__(self, a: int)
...
@overload
def __init__(self, a: str, b: int)
...
Means that the type checker will make assumptions on code interfaces that won't exist during run time as using the second overload will result in an error.
Yes, intersections are different from multiple inheritance. Multiple inheritance picks an implementation based on mro, intersections declare compatibility with both uses. This is a case that needs to be detected as incompatible. There's an additional obscuring problem here in that constructors are currently special cased in subclassing. If this were not the case, your multiple inheritance example should error. (a real world case based on it almost certainly does, because B's attributes are missing from AB, and attempts to use it as AB("a", 1), compatibly with B break)
I agree with your point that multiple inheritance and intersection types are indeed different, from a theoretical perspective.
Which makes me think if we need a full intersection types in Python or would something like a SuperType[...] or MRO[...] should be enough to cover most use cases, as intersections are mostly defined in practice using inheritance.
Intersections are useful outside of multiple inheritance, we have motivating examples involving protocol composition (structural subtyping) and class transforming decorators. ((T) -> (T & BehaviorAddedProtocol)) If this was only about inheritance, I agree that we probably wouldn't need full intersections, making type Generic over a typevartuple of types in resolution order would work.
That's fair actually. And that seems to be more aligned with what's written in the notes.txt which is to follow the MRO for real classes, and actual intersection for Protocols which sort of relates on how the inheritance syntax resolves from both implementation (real classes) and typing perspective (Protocol). This means that Intersection[A, B, C, D, ...] resolves to how class SuperType(A, B, C, D, ...) resolves.
Though as you said, this is actually not how intersection works in theory.
Though going back, I'm not sure which one to actually follow, since the notes.txt and the specification.rst seems to have conflicting inputs.
@baluyotraf If you're interested in picking this up again, I might recommend reading this version of the spec too: https://github.com/CarliJoy/intersection_examples/blob/intersection-pep-new/docs/specification.rst
We all went back and forth on these quite a bit, and I drafted this up as something that expresses a "happy medium", where we have something that is essentially MRO based, but still allows for the use cases of TypedDict, Protocols and class decorators (some of the points of discussions). Couldn't reach agreement on this, but it might provide a jumping off point for a way forward.
You can also see discussions of the PR here: https://github.com/CarliJoy/intersection_examples/pull/49
From what I understand, a inheritance based approach was also discussed in #38 and the current issue is how to resolve it with Any
Sorry, there's a lot of history of discussion on this. The approach in #38 doesn't work.
The short version: Current best is back to fully implementing set-theoretic intersections, where:
T & Any is not a reducible expression
T & object reduces to T
T & Never reduces to Never
Resolution for overlapping attributes is to intersect the types of the attributes for existing members.
The open questions on this are much fewer now, and the majority of hangups actually involve other poorly defined parts of how typing should handle the python data model. In particular, collections.abc.Hashable has the potential to be a very large nuisance with user denotable intersections (intersections are already in the type system via TypeIs, but they are not user-denotable), because it is removable in subclasses. If this is solved in a way existing contributors to the typing ecosystem are satisfied with, I'd be willing to start drafting a formal proposal for user denotable intersections. The ground work on this blocker was taken over to discourse, though discussion on it left me with the impression that people had some issues with the method I had in mind.
Thanks for giving me a quick summary of what's been happening so far.
The proposal on Hashable looks good to me as well. If I set __hash__ = None on a class, I'm assuming that the consumer of that class treats that as an unhashable. Other people can of course override the hashing behavior, but as the writer of the base class, the explicit setting of __hash__ = None means that I can add other internal workings inside the object that won't hash properly, making the inheritance of my code unsafe.
I think the reason why removing hashability should not be done is relatively obvious from an inheritance perspective.
Other things on intersections:
Required/NotRequired: I guess this just resolves toRequired[A & B]? The constructs represent the existence of the field, rather than the type anyway.Annotated: I think this just resolved toAnnotated[A & B, <Annotations from A and B>]
I think it would be nice to have a place where the current status has been summarized. I know you guys have done a lot of work and thinking in this matter already. I can put some work to go through everything. I would imagine most of them are in the issues in this repository.
If someone has something that I can start with that would be nice, otherwise start with the discussions I find interesting and start from there.
Most up to date discussion is on this issue: https://github.com/CarliJoy/intersection_examples/issues/48 - so I would probably start there. I wouldn't try reading all of the old issues tbh as there's just an extremely large amount of discussion (most of which is now outdated or resolved).
The ordered layers spec was just only way I could find to describe a consistent interpretation at the time - I think based on what Mike's saying we're probably at the point an unordered one would be possible which would be cool!
That said there are sections in the ordered layers spec that mention previous discussions, so it's probably a good way to get the spark notes of the hot topics (without trawling hundreds of comments around Any & :) ). Also each of the headings essentially addresses each of the major issues - it might be now that we have another way to resolve each of these, but you can see at least see the issues we were trying to solve here.
From what I understand, a inheritance based approach was also discussed in https://github.com/CarliJoy/intersection_examples/issues/38 and the current issue is how to resolve it with Any
The Ordered layers version basically came out of the discussion on 38 - I adapted it from one Mike drafted around that discussion. It does present a solution to Any & etc but it may not be the preferable one now.
https://github.com/CarliJoy/intersection_examples/blob/intersection-pep-new/docs/specification.rst
Also re Required/NotRequired I think these weren't in Python last time around so may be new discussion.
Re Annotated I think we might have discussed this before but only briefly so might be worthy of a new issue
Required/NotRequired: I guess this just resolves to Required[A & B]? The constructs represent the existence of the field, rather than the type anyway.
We don't have rules for type qualifiers yet. There are a few that are obvious-ish like Final and ClassVar that should just propagate.
Required[A] & NotRequired[B] isn't possible in my interpretation no matter what A and B are due to the nature of structural subtyping, and NotRequired saying it's possible to remove that key/value pair, where Required says it must remain.
`Required[A] & NotRequired[B] isn't possible in my interpretation no matter what A and B are due to the nature of structural subtyping, and NotRequired saying it's possible to remove that key/value pair, where Required says it must remain.
I would suggest that an example that might be worth considering is something like this:
class A(TypedDict):
x: Required[int]
y: str
class B(TypedDict):
x: NotRequired[int]
z: str
a: A & B
E.g. in the above, should we ban A & B essentially because the attribute x can't exist? I would think maybe giving x the type Never could work?
A & B needs to be detected as invalid here in some way, otherwise this "looks" fine to the user until they go to access x.
I can start drafting a rougher form of what should become the formal proposal as it would exist under the assumption the hashability question will have a usable resolution (doesn't need to be the exact resolution linked) which should help serve as a point to refocus rather than ask people to dredge up all of the existing discussion.
A & B needs to be detected as invalid here in some way, otherwise this "looks" fine to the user until they go to access x.
I suppose I'm thinking about the use case of a user with two large typed dicts let's say, and they want to merge them. Thinking perhaps x -> Never would be more ergonomic? As essentially it would force them to redefine the entire class if the merge is banned.
I can start drafting a rougher form of what should become the formal proposal as it would exist under the assumption the hashability question will have a usable resolution (doesn't need to be the exact resolution linked) which should help serve as a point to refocus rather than ask people to dredge up all of the existing discussion.
Sounds good to me!
I suppose I'm thinking about the use case of a user with two large typed dicts let's say, and they want to merge them. Thinking perhaps x -> Never would be more ergonomic? As essentially it would force them to redefine the entire class if the merge is banned.
Yeah, I'm aware of that problem, but I think we need to start with the sound version and leave this as a known place for ergonomic improvement, such as a way to select specific fields to create a new structural type with from an existing type (see Typescript's pick) to help users there.
Yeah, I'm aware of that problem, but I think we need to start with the sound version and leave this as a known place for ergonomic improvement, such as a way to select specific fields to create a new structural type with from an existing type (see Typescript's pick) to help users there.
Yeah I agree, ultimately I think getting "v1" out is probably the most important thing - once it becomes established adding extra features I feel like might be easier to get through. We should probably make a section for "future work" though to keep to track of these as we think of them.
Thinking about it, the hash issue is actually bigger, since the docs basically say you can set any magic method to None, with __hash__ only special cased in the sense that it's implicitly set when __eq__ is defined. I guess for things like __iter__ it's fine to just error out, __hash__ is mainly problematic since users actually add/remove support in subclasses…
Re hashing, I think __hash__ overrides have to maintain the return type from the supertype or else it's an LSP violation? Notwithstanding that I think basically the rule is:
[1,2,3].__hash__ is None # True
[1,2,3].__fred__ # Other methods that are not implemented return an error
The issue is that __hash__ on classes where it's implemented is actually an attribute of None, whereas where it is implemented it's a method.
I would suggest we special case it and say that if:
class B: # Has a hash method by default
pass
def test() -> list[int] & B:
class C(B, list[int]):
pass
return C()
x = test().__hash__ # Would have the signature from B. In other words the Hashable type wins the intersection
I don't intend to special case it. C there doesn't have a hash method. __hash__ isn't looked up by MRO, and has special behavior associated with it.
>>> class B: pass
...
>>> class C(B, list): pass
...
>>> C.__hash__
>>> type(C.__hash__)
<class 'NoneType'>
>>> B.__hash__
<slot wrapper '__hash__' of 'object' objects>
>>>
This is part of it being the only remaining blocker in my view, this particular protocol is pretty important for things like Sets and Dicts, and user-denotability can express invalid intersections in ways that create type errors at runtime.
>>> {B(), C(),}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'C'
It's a pre-existing problem in the type system, but it's very likely to be a real problem with intersections. Some of the sample use cases run into it.
I hope to have a draft state proposal ready by the end of next week, I'm going to take extra time to review everything and make sure all of the edge cases we came up with over this very long discussion have a workable answer, even if that answer is "For now, this is what we can do, but this should be improved as we gain tools for expression". As you said, getting an initial version that people can use is important.
It's interesting that the __hash__ method is not looked up by the MRO. I can't find anything about it in the Python's data model documentation but is hashability intended to be a separate property or is this missing it out?
I guess my question is, if __hash__ does not follow the standard inheritance rules, should it be assumed that the child of a Hashable class is also one? I think yes is the answer from a theoretical perspective, but it looks like the Python behavior has deviations from it.
See https://docs.python.org/3/reference/datamodel.html#object.hash
A class that overrides
__eq__()and does not define__hash__()will have its__hash__()implicitly set toNone
But object has a pre-defined __hash__()
So MRO doesn't not really work here properly.
I do understand how __eq__ and __hash__ interacts. My question is about the __hash__ not being resolved by the MRO.