patma icon indicating copy to clipboard operation
patma copied to clipboard

Can the compiler move guards around?

Open gvanrossum opened this issue 4 years ago • 7 comments

Since we only allow guards at the top level, one is sometimes required to write things like

case [x, y, z] if x >= 0:

Here presumably the compiler generates something like

  • check for sequence
  • check length == 3
  • extract x
  • extract y
  • extract z
  • check guard

But the "check guard" step could be move up two slots so it comes before we even extract y and z. In some cases (e.g. for more complex patterns y and z) this might make a significant difference.

  • @brandtbucher: Would this be easy to do? You'd have to check the guard for occurrences of the various capture variables.
  • All: We'd have to trust (or not care) that there isn't a function call in a guard that implicitly references one of the later capture variables, so we'd have to have some explicit language in the PEP to allow this. (Similar to the existing language about caching certain outcomes.)

gvanrossum avatar Sep 20 '20 04:09 gvanrossum

My immediate impression is that this seems fine. I haven’t thought about the full implications, but it shouldn’t be too hard to implement some sort of dependency graph in the pattern compiler (something like this will be needed to do simpler optimizations like caching name lookups anyways).

However, while I have been doing some experimenting on the side, I’m not planning on implementing any optimizations until the “dumb” implementation is already merged. The PR review will already be wildly complex, and I think that knowledge transfer for other maintainers will be better if we keep things simple to start.

brandtbucher avatar Sep 20 '20 04:09 brandtbucher

That’s a good plan, and I would even say if you can simplify the code by making it slower that would be fine.

gvanrossum avatar Sep 20 '20 04:09 gvanrossum

Cool. Isn’t there a quote attributed to you about Python having the “dumbest compiler ever” (or something to that effect)? ;)

brandtbucher avatar Sep 20 '20 04:09 brandtbucher

We might have to be very, very careful with this. At the top of my mind, I am thinking of something along the lines of:

y = 0

def get_y():
    return y

def foo(arg):
    global y
    match arg:
        case (x, y) if x == get_y(): ...

With the proposed optimisation, we clearly end up getting the wrong result.

So, if the compiler can prove that the expression in the guard has no way of accessing any variables bound by subsequent patterns, it might be fine.

However, the intent of the question is probably more about the guarantees given by the specs, though. The specs would probably not want to explicitly name or propose such optimisations, i.e. allow them only to be applied if the runtime semantics does not change at all. That being said, at the same time, the specs should make it clear that there is no guarantee ever that the interpreter will or will not check any given pattern.

In a case like the following, the interpreter does IMHO not have to check the patterns P and Q applied to 1 and 2, respectively, first. If it discovers that 3 == "three" from the outset, it can just skip the entire pattern directly. At the same token: it does not have to skip P and Q, either, of course.

match (1, 2, 3):
    case (P, Q, "three"): ...

Tobias-Kohn avatar Sep 21 '20 06:09 Tobias-Kohn

We'd have to trust (or not care) that there isn't a function call in a guard that implicitly references one of the later capture variables, so we'd have to have some explicit language in the PEP to allow this. (Similar to the existing language about caching certain outcomes.)

I think that if we're already fine breaking things like side-effecty attribute/item access, then breaking @Tobias-Kohn's example is a no-brainer. ;) Note that our handling of dotted names means that this pattern won't be an issue for assignments to attributes on self, for example... I think that using global/nonlocal name bindings in unintuitive ways like this is the only trivial way of producing broken code.

Are there any non-toy examples that are broken by this, but are still clearly better written using pattern matching than other control-flow constructs like if/else? I'm struggling to come up with any.

brandtbucher avatar Sep 21 '20 19:09 brandtbucher

I think it's a purely theoretical concern, but nevertheless people could write code to find out how the implementation works in practice (or just use "dis"), and then write obfuscated code that depends on it just to show off. :-)

gvanrossum avatar Sep 21 '20 19:09 gvanrossum

I consider this fully pepped; PEP 635 has explicit language:

If a guard stipulates that a variable x must be positive, say (i.e. if x > 0), the interpreter might check this directly after binding x and before any further subpatterns are considered.

gvanrossum avatar Oct 20 '20 17:10 gvanrossum