patma Can the compiler fold class patterns?

trafficstars

In short, should we be able to do simple optimizations like executing the pattern C(<p0>) | C(<p1>) as C(<p0> | <p1>) at runtime?

Wearing my implementer hat, my answer is "yes". We've made it clear that patterns don't always follow the same well-defined behavior of expressions, and explicitly noted that e.g. __instancecheck__ calls may be cached. It seems like this could be a nice performance win for some common cases (e.g. Node('+', lt, rt) | Node('-', lt, rt) -> Node('+' | '-', lt, rt)).

But of course there are pathological cases where the act of matching a pattern could re-bind the class name using inspection or some other magic. I don't feel a need to accommodate those... in fact, the optimized behavior would be arguably more correct in these cases.

Thoughts?

Jul 07 '20 05:07 brandtbucher

In Ivan's original proposal, you would not be able to make this assumption because the __match__ method could do literally anything. Because the match protocol included a lot of information about sub-patterns, a class with a custom match method could do wildly different things based on the sub-patterns used.

The various competing proposals for custom matching do allow this sort of optimization because they are more restricted, which means that an optimizer can make more assumptions. There are two conditions which must hold true in order to make this optimization:

That custom match functions are stateless.
That custom match functions are blind to subpatterns.

This means that for any C(X) and C(Y), C is always the same.

Jul 07 '20 06:07 viridia

TL;DR: No, a global variable C is volatile.

The various competing proposals for custom matching do allow this sort of optimization because they are more restricted, which means that an optimizer can make more assumptions. ... This means that for any C(X) and C(Y), C is always the same.

Actually no, it's possible for users to implement a sequence type with a __getitem__ capable of changing global variables or anything else.

Jul 07 '20 09:07 thautwarm

I think it’s fine to optimize the implementation with the assumption that certain things don’t change while we’re in pattern selection mode. Whether we also assume guards to have no side effects is less clear.

The bottom line however must be that if an object lies or cheats, only runtime exceptions may be raised (or incorrect conclusions reached), but no segfaults or bad writes or reads from uninitialized memory can happen. (I think .sort() has a similar rule — if < lies, your data won’t be sorted, but you won’t lose it.)

Jul 07 '20 14:07 gvanrossum

I think it’s fine to optimize the implementation with the assumption that certain things don’t change while we’re in pattern selection mode. Whether we also assume guards to have no side effects is less clear.

I think it's best to invalidate all known information when hitting a guard.

Basically, I'm working on building a decision tree at compile-time, and it's become clear that it's only safe to do so when we stay in "pattern land", where the normal rules don't apply. For example:

match s:
    case <p0> | <p1>: ...
    case <p2> | <p3> if <g0>: ...
    case <p4> | <p5>: ...

It's safe to use the same decision tree for <p0> through <p3>, but it must be rebuilt for <p4> and <p5>, since <g0> could have done literally anything.

Jul 07 '20 15:07 brandtbucher

I realize that we're not doing custom matching yet, but there is a potential one-way door here - we have to make sure that any future custom matching protocol doesn't violate the assumptions made here by the optimizer.

The best possible optimization can be obtained by putting a fairly strict restriction on custom matchers: they have to be pure functions with no side effects. So you aren't allowed to have a customer matcher that increments an internal counter and uses that to decide whether or not to match (I know that's a ridiculous example, but it's the best I could come up with in the moment.)

Personally I think that custom matchers should be pure functions anyway, since it makes them easier to reason about.

If we made a rule that said that any future custom match implementation had to be a pure function, and also that it was not permissible to re-bind the type name once it had been called, then the VM would only need to call the custom match function once for each input value, regardless of how many times that matcher appeared in a given match statement.

Jul 07 '20 20:07 viridia

But we don't have to say anything now. In the future when we're ready to design a custom match protocol we can take care to design the protocol and add constraints to the customization code to allow the optimizer to make reasonable assumptions.

Jul 07 '20 21:07 gvanrossum

I think this is fully pepped now that we have a section "Performance Considerations", right?

Jul 10 '20 00:07 gvanrossum

Yeah. While we don't explicitly come out and say "we can cache name lookups", it logically follows from the rest of the section.

Jul 10 '20 00:07 brandtbucher

Just to toss in my two cents here...

My own proof-of-concept implementation used with statements for each case clause (this allowed me to assign the variables and skip the entire block by raising an exception in the assignment to be caught by the context manager). In a way this means that I compiled each pattern to something like a function with no ifs directly involved in the main logic. Now, I have not a shred of doubt in my mind that Brandt's implementation is far superior. But I think we could argue that there is actually kind of precedent that shows that different implementations and approaches are possible, and that a pattern might be something 'quasi-atomic' that looks up all the names once, goes off to do some magic, and then comes back with the result (match or no match).

For Brandt's example, I basically agree with that <g0> could do anything, invalidating any assumptions about the patterns <p4> and <p5>. However, if the compiler can prove that no subject s could possibly match both <p2> | <p3> and <p4> | <p5>, then the decision tree could basically ignore <g0>. Let me illustrate this with an example:

match s:
    case int(i) if i >= 0: ...
    case tuple((x, y)) if weird_magic(x, y): ...
    case str(): ...

In this example, the two guards could not possibly interfere with the subsequent patterns. At the time a guard expression is evaluated, we already know that none of the other cases could possibly match. The thing is, though, that looking into this kind of optimisations is clealy way beyond the scope of what we are doing. It merely demonstrates that the actual evaluation order might end up being quite complex and not strictly follow a left-to-right-top-to-bottom order.

Personally I think that custom matchers should be pure functions anyway, since it makes them easier to reason about.

I fully agree. It would be quite nice if we could restrict any future custom __match__ protocol to pure functions without side effects. But I am afraid this might not be that easy to really enforce. The best thing we can probably do is to put a seal on this thing and say if you do anything funny, your warranty will be broken.

A final thought:

Patterns clearly have a tree structure. According to the usual left-to-right evaluation rules of Python, we might expect that pattern matching follows a depth-first approach. But we actually envision an algorithm that is free to choose a breadth-first approach instead, or any combination thereof. For instance, if you do not have a stack machine, you don't want to have intermediate values lying around if you can help it. Hence, in case of something like [ C(x, ...), y ], the optimiser might choose to assign the second value to y (getting that out of the way), before tackling the constructor pattern C or go into its subpatterns like x etc.

After all, we should keep in mind that patterns are a completely new thing in that load and store semantics (necessarily) mix and intermingle here. In particular: a pattern is not an expression and hence the usual rules of expressions simply don't apply. Case in point: if you consider something like if f(x := g()):, then the left-to-right rule is also broken in that g() is evaluated before the assignment to x happens, even though x is on the left of g().

Jul 10 '20 15:07 Tobias-Kohn

I’m with you all the way, except for your last sentence. In that case the order is still prescribed by the language spec. But yes, we should resist the urge to prescribe how pattern matching is implemented. We should just constrain it (e.g. no slicing or negative indexing).

Jul 10 '20 15:07 gvanrossum

(Also no getitem without contains check first, to handle defaultdict. That’s worth specifying.)

Jul 10 '20 15:07 gvanrossum

I believe the decision tree should be opaque when executed in the match statement, regardless of the action of any guards or "interesting" classes.

The analogy I'm thinking of is a for statement. One possible implementation of such a statement would be to use a local variable to store the iterator, then use a goto at the end of the loop. This is what JVM bytecode or pretty much any lower-level assembly would do (except presumably with registers). We don't do this in Python for any number of reasons, but certainly if we did, we could do something like next(locals()[that_iterator]) in the body of the loop.

CPython and other implementations don't allow this, by ensuring that the loaded iterator is opaque to Python code in a for statement (and similar usages like generator expressions); this can be done in the same for match.

Note this doesn't imply that there is some transactional view on these loads - all or nothing by the decision tree - when viewed from another thread. So this means that any lookups can be performed as needed. Obviously if the lookup is not cached, something interesting can be done, but hopefully docs would be enough here. Nor does that necessitate a specific bytecode architecture or helper objects - that's left to the implementation.

Jul 10 '20 16:07 jimbaker

@gvanrossum My intention with the last sentence was really to point out that we are mixing two different 'cultures' (rather than saying that it is without or against the specs), so any arguments along the lines of "but all expressions are evaluated like ..." are quite invalid. But you are right: it is certainly not the best example.

@jimbaker I like the idea of an "opaque decision tree" and thinks it is a good way to approach this. So, the decision tree is a black box where we feed in a subject, turn the cranks and see where the result is spit out. Then we apply any guards to check whether we are happy with the result. But here is the point: if we are not happy, we need to put the subject back into the black box and turn the cranks again. And in the middle, the guard might have tampered with our black box...

Even by making the entire matching machinery opaque, there is still the issue that we might have to run more than once during one matching process. And given Python's semantics, we might have to permit the guard to tinker with the machinery—even if we don't like it.

Jul 10 '20 17:07 Tobias-Kohn

Concretely, if we wrote

match s:
    case a.C(x) if x: print("Yes")
    case a.D(): print("No")

there could be a case where a observes that both a.C and a.D are loaded in a case where the first case clause succeeds (including guard) and the output is "Yes". And if the a.D getattr operation somehow messes with anything that would affect the first pattern or guard, we just say it's undefined what happens.

I am fine with that.

Jul 10 '20 17:07 gvanrossum

Interesting: this is quite the other way round of what I was thinking about all along. Great example!

And yes, I totally agree that our semantics should allow that a.C and a.D both be loaded before any actual matching happens.

Jul 10 '20 17:07 Tobias-Kohn

@Tobias-Kohn The abstract semantics of the "black box" model is roughly

Upon entry to the match, everything is frozen with respect to any lookups (attributes, isinstance, hasattr, match_args, ...). This freezing done on every Python entry - standard Python semantics apply, so any classes can be monkeypatched, etc.
All cases are then evaluated with respect to their patterns, ignoring any guards.
Any matching cases are then tested only with respect to their guards (if any), in the order of the cases. Note that we do not reevaluate patterns, we are only evaluating guards. The first matching case that passes its guard (if any, defaults to True) then executes the corresponding arm. So no need to turn again any cranks on the black box model.

I believe that @gvanrossum 's model here is the practical/best effort version where we throw up our hands and say "undefined behavior" if the black box model's semantics is violated (so long as it is not possible to crash Python), because we didn't actually implement this as strictly as the black box model demands. (The black box model is really impractical! Just think of assignment expressions.)

Still there is at least one possible optimization here with respect to all cases evaluated - this is effectively like a computed switch when doing this on say an enumeration. Whether that is really feasible or not, I don't know. I don't think it is in CPython, but it could be in implementations like PyPy where the JIT observes that in this tight loop, the same match statement is being executed repeatedly, as might be the case with a state machine or interpreter; and of course the enumeration is "stable".

Jul 10 '20 18:07 jimbaker

@jimbaker I think I understand now what you are saying.

The black box was not really intended as describing the actual implementation. But even in case of "assignment expressions" I would like to point out that while we borrow the syntax and idea of something that already existists in Python, its actual semantics might slightly differ. So, the walrus operator does not need to bind immediately.

Anyway, let me give an example for your model, just to make sure I get it right. Given the following code:

match s:
    case int(x) if x >= 0: ...
    case int(x) if x < 0: ...
    case _: ...

this would be compiled to something roughly equivalent to:

if isinstance(x := s, int):
    if x >= 0: ...
    elif x < 0: ...
else:
    ...

So, the isinstance(x := s, int) is effectively evaluated only once, irrespective of the guards.

Jul 10 '20 18:07 Tobias-Kohn

Correct, and this is a good example of what can practically be folded.

Jul 10 '20 19:07 jimbaker

As long as it’s also okay to dots things strictly one at a time, repeating redundant lookups.

--Guido (mobile)

Jul 10 '20 20:07 gvanrossum

patma patma copied to clipboard

Can the compiler fold class patterns?

As long as it’s also okay to dots things strictly one at a time, repeating redundant lookups.

patma
patma copied to clipboard