ruff [red-knot] add support for more type narrowing forms

trafficstars

[x] x is not None
[x] #13715
[x] x is not T (where T is any singleton type, incl a class or function)
[x] x is T
[x] x != T (where y is of type T with T a singleton type)
[x] #13893
[x] not isinstance(x, T) (where T is a class or tuple of classes)
[x] apply negated narrowing from previously excluded cases to elif and else blocks
[ ] x
[ ] not x
[ ] bool(E) (where E is any supported narrowing expression)
[ ] issubclass(x, T) (where T is a class or tuple of classes)
[ ] not issubclass(x, T) (where T is a class or tuple of classes)
[ ] match-statement narrowing forms (individual TODOs in code, could break these out)
[ ] type(x) is C (where C is a class)
[ ] type(x) is not C (where C is a class)
[ ] E and F (where E and F are both supported narrowing expressions)
[ ] E or F (where E and F are both supported narrowing expressions; reduces to (T & C(E)) | (T & C(F)), where T is the prior type of a variable and C(E) / C(F) are the constraints on it obtained from E and F)

This is not an exhaustive list of what we'll eventually want to support, it's just a starter list of some useful ones that aren't currently blocked by other type system features that aren't implemented yet.

(Supporting narrowing on x.attr or x[sub] expression forms is out of scope for this issue.)

Implementation for these lives in crates/red_knot_python_semantic/src/types/narrow.rs.

These cases we may want eventually, but we are deferring for now (please don't add support for these right now):

[ ] x == None (not sound for most x, don't want to do it unless driven by real use cases)
[ ] x == T (where T is any literal type) -- not sound for most x; need real use cases (maybe TypedDicts are a use case?)
[ ] x != T (where T is any literal type) -- use cases not clear yet; could result in complex-but-useless intersections

Oct 09 '24 19:10 carljm

https://github.com/microsoft/pyright/blob/main/docs/type-concepts-advanced.md#type-guards is pyright's documentation on supported narrowing forms.

Oct 09 '24 21:10 carljm

Thanks for listing them down. There are also pattern constraints which are applied and yet to be implemented: https://github.com/astral-sh/ruff/blob/5b4afd30caebd6e2c5c27bbf5debc1663cbb28a2/crates/red_knot_python_semantic/src/types/narrow.rs#L96-L125

Oct 10 '24 05:10 dhruvmanila

Is one of the tasks a good first issue? I really liked the idea behind red-knot and am looking forward to contribute

Oct 11 '24 09:10 Lexxxzy

Adding support for x != None should be very good for a first issue, since it should have exactly the same effect as x is not None, which is already supported.

Oct 11 '24 16:10 carljm

Some predicates narrow based on characteristics of an object that could be transient for mutable objects.

For example, if x: means that x must be truthy. This means that if x is typed as None, or Literal[False] or Literal[0], we can narrow its type to Never, since those types represent known immutable objects that are never truthy and never can be. If x is typed as a union including any of those types, we can safely eliminate them from the union. And it will be important that we can do this. But we cannot generally assume that this narrowing means x will remain truthy. If x is a list, it could be truthy one moment, and the next moment (after a .clear() call), it no longer is.

It is not correct to represent the narrowing effect of if x == 1: as an intersection with Literal[1], because any type in Python can write an __eq__ method that compares equal to the integer 1. So x == 1 does not imply that x must actually be of type Literal[1]. Even if x were previously typed as int, it could be an instance of a subclass of int that compares equal to 1; that's not Literal[1]. And similar to truthiness, we can't try to preserve our knowledge that x == 1 for arbitrary x type, because x could be mutable, and its equality to 1 could change any time after the narrowing, without us being aware of it. But there are some types that we know represent objects that can never become equal to 1: for instance None, or any other literal type that isn't Literal[1]. It will be important that we can narrow away these types.

To sum this up with examples, this is the behavior I think we will want:

x: int | None
if x:
    # this could also be `int & ~Literal[0]`?
    reveal_type(x)  # revealed: int
else:
    # we can't narrow `int` down to `Literal[0]` because of subclasses
    reveal_type(x)  # revealed: int | None

x: Literal[0, 1, 2]
if x:
    reveal_type(x)  # revealed: Literal[1, 2]
else:
    reveal_type(x)  # revealed: Literal[0]

x: int
if x == 1:
    # not safe to narrow to `Literal[1]`, could be a subclass of `int` with custom `__eq__`
    reveal_type(x)  # revealed: int
else:
    # could be `int & ~Literal[1]`
    reveal_type(x)  # revealed: int

x: Literal[1, 2, 3]
if x == 1:
    reveal_type(x)  # revealed: Literal[1]
else:
    reveal_type(x)  # revealed: Literal[2, 3]

One way to approach this, which is tempting in its generality (and previously tempted me), is to define new Type variants such as Truthy and EqualTo(T), and then return these types as the constraints on x from x or x == 1. Then we can handle these types in intersection simplification, where e.g. Truthy would be disjoint with None and Literal[False] and some other types, and EqualTo[Literal[1]] would be disjoint from all other literals.

The downsides of this approach are two-fold. One is that we then have to handle these types in all type operations, increasing the size of the matrix for binary operations especially, even though it's not clear that these types have expressive utility outside of narrowing.

The bigger problem is that these types represent possibly-transient characteristics, as discussed above. A particular list[int] may be truthy one moment, and then after a .clear() call, it is no longer truthy. This means that really the only safe use of these types is to immediately eliminate union elements that we know must always be disjoint with them; otherwise we should simply discard them. For example, after x == 1 where x is int, not only can we not narrow x to Literal[1], we shouldn't even keep its type as int & EqualTo(Literal[1]), because it may be that some subsequent method call on x makes it no longer EqualTo(Literal[1]) at all. In the same way, l.clear() would invalidate an intersection like list[int] & Truthy.

I think this is sufficient reason to discard having types like EqualTo or Truthy.

I think the key observation here is that, since there are a limited set of types where we know all inhabitants are immutable with respect to equality and truthiness (all literal and singleton types), the narrowing we can apply in these cases is only a negative intersection to eliminate some of these literal types, never an intersection with a positive type. If the narrowing is represented as an intersection with some constraint type (as we currently represent it), the constraint type must be a negation type (an intersection with only negative elements in it.)

This is easy for if x:: it can be safely represented as an intersection with ~Literal[0] & ~Literal[False] & ~Literal[""] & ~None. But this approach doesn't work for narrowing if not x: or if x == 1:, because the set of all known-never-to-be-falsy types is too large to represent directly (it includes all literal types that are not 0/False/""/None), and similarly the set of all known-not-to-equal-1 types is very large (all literal types that are not Literal[1]).

We could inspect the current type of x in if x == 1: while inferring narrowing constraints, and just build a negative intersection containing only the relevant types from the too-large-to-represent set of negative constraints. That is, if x is Literal[1, 2, 3] and we have if x == Literal[1]:, in theory we'd want to intersect x with the too-large-to-represent type "all literal types that are not Literal[1]", but in practice we could intersect it simply with ~Literal[2] & ~Literal[3] and get the same result. This muddies the boundaries between inferring a constraint and applying it. But this may be a totally fine pragmatic approach.

A variant of this could be to have constraints no longer be represented simply as a Type to intersect with, but as a new enum that might be just a Type to intersect with, or might be something like EliminateNotEqualTo[Literal[1]] or EliminateNonTruthy, and then we'd apply those constraints later, in type inference, when we already have the type of x in hand. The actual application would look similar to what we'd do in the previous paragraph, we just wouldn't do it eagerly when inferring the constraint itself, so as to keep "inference of constraints from a test expression" and "application of constraints to a type" more cleanly separated. Type could even gain dedicated methods like exclude_never_equal_to(Type) and exclude_always_truthy to handle intersecting a type with these too-large-to-represent negations.

One practical reason to prefer the latter (placing this logic in Type instead of directly inside constraint inference) is that there are other places we may want to narrow a type in this way, where we aren't using definitions and constraints. One such case is https://github.com/astral-sh/ruff/issues/13632

EDIT for posterity: in the end in #14687 we did decide to add Type::AlwaysTruthy and Type::AlwaysFalsy, but with a narrower definition. These types are only inhabited by objects which are always guaranteed to return LiteralTrue] or Literal[False] from __bool__, never by objects which may return different values at different times. That addresses the unsoundness issues discussed above. We can only ever intersect with these types as a negated type; so e.g. if x intersects with ~AlwaysFalsy, allowing us to eliminate known-always-falsy types (e.g. Literal[False] or Literal[0] or an instance of a class with def __bool__(self) -> Literal[False]) from the type of x.

Oct 25 '24 17:10 carljm

"Not isinstance" and "elif else" can be checked off ✅

Oct 28 '24 23:10 TomerBin

I think we also need to handle TypeIs[T] and TypeGuard[T] for narrowing. (blocked by the implementation of generics for now)

Although the implementation will likely be much later, I thought it would be useful to list it here. Once completed, we should be able to effectively test it using various functions from the inspect module (e.g., isclass, isfunction).

Nov 04 '24 12:11 cake-monotone

Yeah, I was thinking of TypeIs and TypeGuard as a separate feature, but it doesn't hurt to list it here.

I don't think it is blocked on generics. These are just special forms, which use bracket syntax, but they don't require user-defined generic classes or functions.

Nov 04 '24 21:11 carljm

Another possible narrowing pattern:

a: LiteralString

if a == "foo":
	reveal_type(a)  # revealed: Literal["foo"]

Nov 28 '24 00:11 InSyncWithFoo

I'm looking at evaluate_expr_compare now, and in the ast::Expr::Name arm the CmpOp types left are Lt, LtE, Gt, GtE. Are these arms that we intend to cover? It seems like we probably cannot very easily implement any narrowing for these. @carljm

https://github.com/astral-sh/ruff/blob/eb3e1763096bd85b37f1609b6baf1e10694143cf/crates/red_knot_python_semantic/src/types/narrow.rs#L416-L418

Edit: From https://github.com/microsoft/pyright/blob/main/docs/type-concepts-advanced.md#type-guards pyright supports len(x) < L where x is a tuple. I suppose i can add this?

Apr 02 '25 00:04 MatthewMckee4

Sure, adding the length narrowing on tuples would be fine, though I think relatively low priority. I agree that otherwise there's probably no narrowing we can do on less-than or greater-than compares.

One kind of narrowing not mentioned here (could create a new issue for it) that would have significant impact on false positives would be narrowing on assert statements. This doesn't involve narrowing on any new kinds of test expressions, it just means adding a narrowing constraint in SemanticIndexBuilder when we hit an assert statement.

Apr 02 '25 00:04 carljm

Cool, sounds good, I'm up for that

Apr 02 '25 00:04 MatthewMckee4

I'm going to go ahead and close this issue, we've done most of the things listed here, and I think there's limited value at this point in having a blanket tracking issue for narrowing. We can create new issues as we decide to prioritize new kinds of narrowing.

Apr 02 '25 00:04 carljm

ruff ruff copied to clipboard

[red-knot] add support for more type narrowing forms

ruff
ruff copied to clipboard