typing
typing copied to clipboard
Type guard doesn't intersect types like instance check
The type guard doesn't seem to work like an instance check:
from typing import TypeGuard, Any
class X: pass
class Y: pass
x: X = X()
assert isinstance(x, Y)
def f(z: Any) -> TypeGuard[Y]:
return isinstance(z, Y)
reveal_type(x) # Revealed type is "a.<subclass of "X" and "Y">", which is great!
x2: X
assert f(x2)
reveal_type(x2) # Revealed type is "a.Y", which is unfortunate.
This is causing problems with the definition of dataclasses.is_dataclass
https://github.com/python/typeshed/issues/9723.
The above code was tested with MyPy and Pyright.
This isn't a bug, it's precisely how TypeGuard is specified (https://peps.python.org/pep-0647/#typeguard-type: "the first positional argument to the type guard function should be assumed by a static type checker to take on the type specified in the TypeGuard return type"). A separate mechanism will need a new PEP.
Thanks for the clear explanation. Do you think it's a good idea for me to propose this in typing-sig?
Should probably read https://mail.python.org/archives/list/[email protected]/thread/JTO3WKRKQFW5YFP3ZSMNTRIEA5NPJEM6/ first. I think Eric has a well-developed proposal but hasn't pushed it forward because of insufficient interest.
Okay, I've read that thread. It's not something I'm used to thinking about, so it takes me a bit longer. The proposal in my issue is for the positive case to do narrowing. It seems like Erik's proposals are about what to do in the negative case. (@msfterictraut what do you think?)
Also, regarding MyPy specifically, would this be easy for you to do since you're calling some kind of narrowing function for isinstance
, but replacing the type for type guards? Ideally, it would be as easy as narrowing for type guards.
@NeilGirdhar, it's not clear to me what you're proposing, so I can't offer any thoughts. Can you clarify what you're proposing?
Also as Jelle noted, TypeGuard
is working as designed in your code above. There has been some discussion of adding an alternate form (StrictTypeGuard
) that has different semantics and behaviors from the existing TypeGuard
. This new form would be less flexible than TypeGuard
but would allow for narrowing in the negative case and union narrowing in general. Even with StrictTypeGuard
, you won't get the same behaviors as built-in type guards like isinstance
. Those built-in type guards contain lots of custom logic based on detailed knowledge of the calls or operators used in the conditional expression. For example, the isinstance
type guard can produce an intersection type as shown in your code sample above. A type checker can't make the same assumptions about a user-defined type guard.
If you want the semantics of an isinstance
type guard, then you should use isinstance
directly rather than attempting to wrap it in a user-defined type guard function.
Thanks Erik for your detailed reply!
Can you clarify what you're proposing?
I would like asserting on the type guard to narrow the argument type to the intersection of its original type X
and the type guard type Y
: X & Y
. MyPy seems to do this for isinstance
, but not for type guards.
This would fix the problem whereby is_dataclass(X)
narrows to type[DataclassInstance]
and loses X
. I would like it to narrow to X & type[DataclassInstance]
.
For example, the isinstance type guard can produce an intersection type as shown in your code sample above. A type checker can't make the same assumptions about a user-defined type guard.
Why not?
If you want the semantics of an isinstance type guard, then you should use isinstance directly rather than attempting to wrap it in a user-defined type guard function.
How do we do that with is_dataclass
?
There has been some discussion of adding an alternate form (
StrictTypeGuard
) that has different semantics and behaviors from the existingTypeGuard
. This new form would be less flexible than TypeGuard but would allow for narrowing in the negative case and union narrowing in general.
I read your strict type guard idea. I'm not sure if it's related. I'm not interested in the negative case. What's 'union narrowing"?
I read your strict type guard idea. I'm not sure if it's related.
StrictTypeGuard
is another form of user-defined type guard with slightly different semantics. It will not help with the issue you're posing here, so you can ignore it.
Why not?
Because isinstance
has very specific semantics and behaviors. A static type checker cannot assume that a user-defined type guard function has those same behaviors. I suppose it would be theoretically possible to define a special IsinstanceTypeGuard
that replicates the isinstance
type guard semantics exactly, but I don't see why that would be useful when we already have isinstance
. If you want the isinstance
type guard semantics, just use isinstance
in your conditional statement.
How do we do that with is_dataclass?
I don't think I can answer that question because I don't know enough about is_dataclass
or what you're trying to do. Is is_dataclass
implemented internally with a simple isinstance
check? If so, why was a new API created rather than telling developers to simply use isinstance
if they want to check for a dataclass? I'm guessing that its behaviors are somewhat different (perhaps more complex) than an isinstance
check.
I don't think I can answer that question because I don't know enough about
is_dataclass
or what you're trying to do. Isis_dataclass
implemented internally with a simpleisinstance
check?
Sorry, I should have linked it: is_dataclass is a standard library function that checks if an object is a dataclass instance, or if a type is a dataclass.
That's why we can't simply use isinstance
. I guess we could put is_dataclass
into an ABC. __subclasshook__
, and then use an ordinary isinstance
on that to recover the desired behaviour (I tested that here).
However, it's not clear to me why that's necessary. When a type checker sees assert f(x2)
(from the code in the issue), why can't it narrow x2
's type the same way it did when it sees assert isinstance(x2, Y)
?
Is
is_dataclass
implemented internally with a simpleisinstance
check?
It's a library function. It think it checks for various attributes. It can't check isinstance
since dataclasses have no base class. (A design error IMO, but too late to complain.)
When a type checker sees assert f(x2) (from the code in the issue), why can't it narrow x2's type the same way...
That's not how TypeGuard
works. I won't explain in detail why, but if you look at some of the examples in PEP 647, you'll quickly see why it would be completely inappropriate for TypeGuard
to imply "intersection".
I'll also note that there is currently no notion of an intersection in the Python type system. If and when such a concept is added, you could use a TypeGuard
that explicitly uses an intersection. That said, adding intersection support to the Python type system is not straightforward. The discussions I've seen so far seem to be leaning toward very constrained uses, and it's unlikely that these would be sufficient for this use case.
That's not how
TypeGuard
works. I won't explain in detail why, but if you look at some of the examples in PEP 647, you'll quickly see why it would be completely inappropriate forTypeGuard
to imply "intersection".
Sorry, but I don't see it. Consider the is_set_of
function. Suppose you have x: set[X]
, and then you assert on is_set_of(x, Y)
. Currently, that makes x: set[Y]
. Your argument is that it would be inappropriate to infer x: set[X] & set[Y]
? But this seems to contradict PEP 647, which says "Type checkers should assume that type narrowing should be applied to the expression that is passed as the first positional argument to a user-defined type guard." And set[Y]
is not narrower than set[X]
whereas the intersection is narrower.
Also, you could wrap up any single-class type guard f
in an ABC.__subclasshook__
, and then switch from asserting on the type guard to asserting it's an instance of your ABC. Then, you get the intersection behavior (in Mypy and Pyright at least). So, I don't see why it's "completely inappropriate".
I'll also note that there is currently no notion of an intersection in the Python type system.
Yes, I understand that. However, both Pyright and Mypy seem to have a rudimentary intersection that they seem to use with instance assertions.
If you read PEP 647, you will not see the word "intersection" anywhere within the spec. When I wrote the PEP, it didn't even occur to me to include intersections for several reasons. First, there is no such concept in the Python type system today. Second, intersections don't make sense for many of the use cases I envisioned for TypeGuard
. Third, the PEP was loosely based on the user-defined type guard facility in TypeScript, and it doesn't use intersections (even though TypeScript does have such a notion in its type system).
I'm not sure if you're arguing that a reading of PEP 647 implies intersections or that you wish it had. In either case, I think the point is moot because it doesn't. The spec is final, and mypy, pyright and other type checkers implement TypeGuard
as it is spec'ed.
You're welcome to propose a new variant of TypeGuard
that does use intersections, but that would need to be done as a new PEP, and it would need to go through the normal discussion and review process. I'll also note that any proposal that depends on intersections would necessarily need to define and spec the concept of intersection types in the Python type system — or wait for another PEP to introduce such a concept.
To make my point about intersections being inappropriate for TypeGuard
, let's take a look at the first example in PEP 647:
def is_str_list(val: List[object]) -> TypeGuard[List[str]]: ...
You're arguing that the narrowed type should be List[object] & List[str]
(i.e. an intersection between the input type and the type guard type) rather than List[str]
. I'm not sure what List[object] & List[str]
even means. Presumably it would mean that the resulting type is compatible with both List[object]
and List[str]
and can be used anywhere either of these types can be used. It would also imply that you can append None
to it, since None
is an object
.
The issubclass
logic in pyright and mypy make use of an "anonymous subclass" that derives from two base classes. I guess you could call this a rudimentary intersection, but it's not really the same as an intersection, and there are limits to this approach. For example, you cannot express set[X] & set[Y]
using this technique because it's not possible to subclass from the same generic class multiple times, and even if you could, only one of the subclass' methods would be visible due to MRO rules.
The PEP is not as clear as it could be. The phrase "takes on the type" is used informally and never defined in the PEP. It might mean that it has exactly the type, or it might mean that it has that type in addition to its other types (like isinstance). But I agree that if you step back from the PEP you'll realize that it unfortunately can't be an intersection.
I say unfortunate, because we can't "combine" user defined typeguards like this:
if is_serializable(x):
if is_dog(x):
# We've forgotten it's serializable.
elif is_cat(x):
# Forgotten serializable.
The simple technical reason it can't be an intersection is because the PEP wants to allow type guards that don't make sense statically, like the motivating is_str_list
example.
A simpler example:
def is_int(x: None) -> TypeGuard[int]: ...
Here the intersection None & int
is empty so we've got to forget it's None
in order to treat it like an int
.
About list[object] & list[str]
, that's an empty intersection. It's a paradoxical list. You can only get strings out of it but you can put anything in. That's why list
is invariant. A list[object]
is never a list[str]
and you can't even check at runtime.
For @NeilGirdhar, you could try to sidestep user-defined type guards. Instead of user-defined type guards, you could try user-defined downcast functions. These return their argument unchanged or None
and should work without any new type system features.
You can cast a type to a consistent subtype ("strict narrowing") by returning the argument unchanged like:
def as_str_list(val: list[Any]) -> list[str] | None:
return val if all(isinstance(x, str) for x in val) else None
And you will get narrowing for free by using it like:
ss = as_str_list(xs)
if ss:
# ss is list[str]
You can also write the unsafe version by using typing.cast:
def unsafe_as_str_list(val: list[object]) -> list[str] | None:
if all(isinstance(x, str) for x in val):
return cast(list[str], val)
Which has the feature that the library writer has to acknowledge that the cast doesn't make sense statically. With type guards, you can accidentally write something that you think is safe but it's not, like the example from the PEP.
You still don't get intersections like you do with is_instance, so you have to remember which name to use. You have to write:
x: X
d = as_dataclass(x)
if d:
# Use d for the dataclass and x when you need type X.
[Edit: fixed type annotation.]
The
issubclass
logic in pyright and mypy make use of an "anonymous subclass" that derives from two base classes. I guess you could call this a rudimentary intersection, but it's not really the same as an intersection, and there are limits to this approach.
That's interesting. I wondered how you did it. Here you have to be careful not to depend on the order of the bases though. Pyright accepts this:
class A:
def f(self, x: int) -> int: return x
class B:
def f(self, x: int, y: int) -> int: return y
def test(x: B) -> None:
if isinstance(x, A):
reveal_type(x)
x.f(0, 1)
and reveals Type of "x" is "<subclass of B and A>"
. Swap the order of the types:
def test(x: A) -> None:
if isinstance(x, B):
reveal_type(x)
x.f(0, 1)
and reveals Type of "x" is "<subclass of A and B>"
with an error Expected 1 positional argument (reportGeneralTypeIssues)
.
These cases should be symmetrical.
Mypy is doing something more subtle. It accepts both without errors. But it doesn't reveal any type, which does indicate that it knows that code is unreachable --- there can be no subclasses of both A
and B
. [Edit: I mean statically. The runtime will allow it as long as the MRO is consistent.] It does statically reject any attempt to define such a class, with the bases in either order, because of the incompatible override.
It appears more like it's treating this as something symmetrical like Union[subclass of A and B, subclass of B and A]
.
You're arguing that the narrowed type should be
List[object] & List[str]
(i.e. an intersection between the input type and the type guard type) rather thanList[str]
.
Just to clarify this: if the argument to is_str_list
had static type T
then the narrowed type "should be" T & list[str]
. The parameter type of the type guard isn't relevant. That is: if we knew before the call that it was statically a T
, then we still know that after the call and also that it's a list[str]
.
list
isn't the best example here because it's invariant.
@erictraut Thanks for your detailed reply. I've learned a lot from your github comments over the years and I always appreciate them.
To make my point about intersections being inappropriate for
TypeGuard
, let's take a look at the first example in PEP 647:def is_str_list(val: List[object]) -> TypeGuard[List[str]]: ...
You're arguing that the narrowed type should be
List[object] & List[str]
(i.e. an intersection between the input type and the type guard type) rather thanList[str]
.
PEP 647 says "Type checkers should assume that type narrowing should be applied to the expression that is passed as the first positional argument to a user-defined type guard." The way I read it was that it is saying that it doesn't matter what the parameter annotation (list[object]
) is. What matters is the type of "the expression that is passed in", i.e., the argument type (T
). I realize that that may not have been what you intended, but it seems to be a reasonable reading of what's written? This fits with what @kmillikin explains here.
I'm not sure what
List[object] & List[str]
even means. Presumably it would mean that the resulting type is compatible with bothList[object]
andList[str]
and can be used anywhere either of these types can be used.
Exactly 😄
If we go to the top of this issue. It was submitted to get is_dataclass
working in an elegant way. Imagine you have something like:
x: T
assert is_dataclass(x)
Ideally, this would result in x
having type T & DataclassInstance
. Under the current system, either x
has type T
or x
has type DataclassInstance
. Information is necessarily lost. I realize that intersection doesn't exist yet. In the case that a type guard checks a basic type like DataclassInstance
, then type checkers like MyPy could apply the logic they use for instance checks and return '"anonymous subclass" that derives from two base classes'.
However, that seems like too much work for a partial solution. In an ideal world, I believe they would return the intersection of the argument type and the check type. What do you think?
Here the intersection None & int is empty so we've got to forget it's None in order to treat it like an int.
If you need type forgetting behaviour, can you not cast to Any?
Instead of user-defined type guards, you could try user-defined downcast functions…
This was extremely clever! Thanks for teaching me something new.
The new PEP 724 proposes:
- narrowing types when a user-defined type guard function returns False, and
- applying additional (more precise) type narrowing under certain circumstances when the type guard function returns True.
(The second part is this issue.)
Basedmypy has typeguards that intersect, read more here: https://kotlinisland.github.io/basedmypy/based_features.html#reinvented-type-guards
I think this can be closed now that TypeIs
is part of the typing spec.
For future reference, adding a link to the section on TypeIs
.