ocaml The Pattern-Matching Bug: fix totality information

This PR is part of the #7241 fix; it is the first un-merged PR in the stack and is ripe for review.)

#13138 introduces information on the "transitive mutability" of argument positions in pattern-matching submatrices: an argument is transitively mutable if it is located transitively under a mutable field (from the root of the value).

The present PR uses this information to pessimize the compilation of switches generated by the pattern-matching compiler. All switches that are in a transitively mutable position are assumed to be Partial, even if the type-checker says that they are Total. (If you want, there is a longer explanation as a code comment in the PR itself.)

This change fixes all known remaining instances of the issue describe in #7241 (unsound interaction between pattern-matching and mutation), in particular all known-wrong behaviors in the testsuite.

(TODO: this is missing a Changes entry.)

Degradation in generated code

The code generated by the pattern-matching compiler will be degraded if the following conditions are all met:

The position is transitively mutable.
At the position that we are generating a switch for, a language construction is used for which Total improves the quality of generated code. This can happen if either (A) only a strict subset of possible constructors is handled at in this submatrix, with the rest having been handled before, or (B) all valid constructors are handled by this submatrix, but this relies on checking that GADT equations are unsatisfiable, and the pattern-matching compiler does not know this. (On the other hand, the pattern-matching compiler does know about the set of constructors possible for a non-GADT sum type or for closed polymorphic variants.)

So the affected code looks like one of those:

(* case (A) above *)
let f : bool option ref -> ... = function
| { contents = None } -> ...
| { contents = Some true } -> ...
| _ when guard () -> ...
| { contents = Some false } (* here *) -> ...

type _ gadt = Int : int t | Bool : bool t

(* case (B) above *)
let g : int gadt ref -> ... = function
| { contents = Int } (* here *) -> ...

In the case (A) (example f), the behavior was previously unsound (in particular if we consider concurrent mutations). There are instances of (B) (example g) where the compiler was sound, and now generates slightly worse code -- the function g above is one such example -- by including a test with a Match_failure case.

In a follow-up PR I implement a simple heuristic that re-optimizes the compilation of some GADT matchings in mutable position, so that the function g above is not pessimized anymore.

May 07 '24 15:05 gasche

I am not highly confident that this is the right characterization, but it may be helpful anyway: If a pattern does not bind any variables, then there's no need to check that the relevant subfield of the scrutinee still matches that pattern before performing the action.

Jun 13 '24 21:06 ncik-roberts

I checked that #13154 doesn't de-pessimize either case I describe, though you might want to confirm.

Jun 13 '24 21:06 ncik-roberts

I'm afraid that I don't have an easy answer to your question. I don't see a feasible, non-invasive way to improve the pattern-matching compiler that would re-optimize these programs and not have other negative consequences.

Below is to explain the situation of this examples in details, and I list a few avenues for improvement that I considered -- I did not found any of them to be convincing.

My impression is that the right choice to do here is probably to favor correctness over (complexity or performance), and accept that those examples now generate slightly worse code.

Understanding what happens

In those two examples, the compiler generates a split during pattern-matching compilation, and the matrices below the split get slightly worse compilation.

First example:

let both_some t =
  match t with
  | { x = Some x; y = Some y } -> Some { x; y }
  | { x = None; y = _ } | { x = _; y = None } -> None

with dbg = true we see the following debug output:

    SPLIT
      Some x Some y 
      None _ 
      _ None 
    INTO:
      First matrix:
        PM:
          Some x Some y 
          None _ 
      Default matrix 3:
        PM:
          _ None

The split comes from the pattern { x = _; y = None } which cannot be sent to a single branch of the decomposition of x into None or Some -- if we wanted to avoid a split we would have to include it in both submatrices, and this is precisely the sort of potential-code-size-explosion that the compilation scheme tries hard to avoid.

In this case it is a bit unfortunate as we know that in fact this {x = _; y = None } is equivalent to { x = Some _; y = None }, as the x = None case is already handled earlier. If we change the code to use this more precise pattern

let both_some t =
  match t with
  | { x = Some x; y = Some y } -> Some { x; y }
  | { x = Some _; y = None } | { x = None; _ } -> None

then you get the nice compilation you expect, without a split. The conceptual difference is that all cases now start with a head constructor for the first column.

Second example:

let rank_of_first_a = function
  | { field1 = A; _ } -> 1
  | { field1 = B; field2 = A; _ } -> 2
  | { field1 = B; field2 = B; field3 = A } -> 3
  | { field1 = B; field2 = B; field3 = B } -> -1
;;

let rank_of_last_a = function
  | { field3 = A; _ } -> 3
  | { field3 = B; field2 = A; _ } -> 2
  | { field3 = B; field2 = B; field1 = A } -> 1
  | { field3 = B; field2 = B; field1 = B } -> -1
;;

The rank_of_first_a function can be compiled from left to right without any split.

  PM:
    A _ _ 
    B A _ 
    B B A 
    B B B

The rank_of_last_a function needs to split on each column, so (because the record fields are mutable) it now generates worse code.

  SPLIT
    _ _ A 
    _ A B 
    A B B 
    B B B 
  INTO:
    First matrix:
      PM:
        _ _ A 
        _ A B 
    Default matrix 2:
      PM:
        A B B 
        B B B

Re-optimizing the code?

The following ideas come to mind to re-optimize the generated code.

Column compilation order (second example)

The rank_of_last_a example is a textbook example of the fact that it is sometimes useful to choose another column decomposition order than pure left-to-right. Luc Maranget's research on pattern-matching compilation has come up with good heuristics to pick the next column of the matrix do decompose, but OCaml (and I think pattern-matching compilers for other languages as well) uses left-to-right always. This is simpler, and in fact it is important in presence of GADTs, because GADTs can propagate type equalities from left to right in the patterns, and therefore matching on columns in a different order may result in considering incoherent columns, that is, columns with patterns of different, incompatible types.

For this reason, I don't think it is realistic to hope to move from the left-to-right compilation order in the context of this PR / fixing this bug.

Avoiding unecessary splits (first example)

I pointed out that it is possible to manually specialize x = _; y = None into x = Some _; y = None, and get a better compiled output. Could the compiler figure this out by itself? I think that the answer is yes, because the information of which constructors may flow into the wildcard is related to the information computed by the usefulness analysis. When a wildcard may only correspond to one constructor (and not several), refining it into this construction should always improve the code. (Optimizing this might improve other examples that do not contain mutable fields and are not affected by this PR.) But computing this is non-trivial and it may have a negative impact on the performance of the pattern-matching compiler -- we probably don't want to attempt such an optimization on all wildcard patterns that the splitter encounters. I don't see an easy, clearly-a-good idea way to do this.

Binding mutable fields earlier

I think that the compilation of the first example would improve if we were to read mutable record fields eagerly, when matching on their record, instead of binding them only when they come up in the first column. In particular this would avoid the double read on (field_mut 1 t/272) that you noticed in the first example.

This is an approach that we proposed in #12556. I ultimately decided against it because it was a more invasive change to the compilation strategy, that came with the risk of pessimizing other examples by generating un-necessary mutable reads. In particular, in the second example, notice how the compilation of rank_of_first_a performs its mutable reads on-demand, and the value 1 is returned after reading just field1 = A and none of the other fields. With the compilation scheme of #12556 we would loose this good property, the code would start by reading all fields in all cases, just like rank_of_last_a.

Treating these failures as unspecified behavior

With this PR, the pattern-matching compiler downgrades some submatrices from Total to Partial, and it will compile them like any other Partial submatrix, that is, by ensuring that there is a catch-all case that goes to a Match_failure clause.

Another option would be to teach the compiler that (1) yes, the failure case is possible, but (2) we don't particularly care about getting a Match_failure exception here, fusing this with another case would be fine. In essence, we would treat the "the value changed under our feet during matching" as an "unspecified behavior" situation (any type-safe behavior is accepted) rather than a "clean dynamic failure" behavior.

For example, if we look at your first example, a part of the generated -drawlambda output was as follows before this PR:

  with (2) (let (*match*/284 =o (field_mut 1 t/275)) 0)))))

and now it looks like this:

  with (2)
   (let (*match*/284 =o (field_mut 1 t/275))
     (if *match*/284 (exit 1) 0))))

This corresponds to a sub-clause of the form { ... ; y = None } -> None.

Before the pattern-matching compiler would generate code that reads the field y, but then notice from totality information that the only possible constructor at this point is None, and thus elide the check that y is indeed None and return 0 (that is, None) right away.

With this PR, the compiler is aware that the value of y could have changed in the meantime, so it may be something else than None, and therefore it does not elide the check. If the check y = None fails, it goes to (exit 1) which is the Match_failure clause.

Instead we could say: okay, y may be something else than None, but in that case we are fine with Match_failure or something else that makes sense, for example 0, and so you can generate (if *match*/284 0 0), or maybe even just 0.

I have a vague idea of how we could try to do this:

When we compile a submatrix, take note of whether it is empty, in which case it is an "immediate action" that does not rely on any extra condition or variable bound in its patterns.
When we degrade Total due to mutable patterns, use a special degraded mode AlmostTotal instead of Partial.
In mk_failaction_pos, if we need to generate a fail action but we are in AlmostTotal mode, instead of the final exit we can use any other exit that corresponds to an empty submatrix.

However, note that this optimization approach comes at the expense of weaker semantics. With the current approach, users know that if their program is affected by a value being mutated while it is matched, they will get a clean Match_failure so they will notice the issue and investigate. With the proposed "unspecified" semantics, we silently generate better code and users do not know that their program contains what is arguably a programming error.

Aside: a remark on redundant reads

The first example that you give has the frustrating property that there are splits, as you can see in the -drawlambda output, but the exit ... jumping to the split submatrices are inlined away by the "simplification" pass that follows, as can be seen in the -dlambda output.

RAWLAMBDA:
    (both_some/273 =
       (function t/275
         (catch
           (let (*match*/281 =o (field_mut 0 t/275))
             (catch
               (if *match*/281
                 (let
                   (x/276 =a (field_imm 0 *match*/281)
                    *match*/282 =o (field_mut 1 t/275))
                   (if *match*/282
                     (let (y/277 =a (field_imm 0 *match*/282))
                       (makeblock 0 (makemutable 0 x/276 y/277)))
                     (exit 2)))
                 (let (*match*/283 =o (field_mut 1 t/275)) 0))
              with (2)
               (let (*match*/284 =o (field_mut 1 t/275))
                 (if *match*/284 (exit 1) 0))))
          with (1)
           (raise
             (makeblock 0 (global Match_failure/20!) [0: "test1bis.ml" 7 2])))))

LAMBDA:
    (both_some/273 =
       (function t/275
         (let (*match*/281 =o (field_mut 0 t/275))
           (if *match*/281
             (let (*match*/282 =o (field_mut 1 t/275))
               (if *match*/282
                 (makeblock 0
                   (makemutable 0 (field_imm 0 *match*/281)
                     (field_imm 0 *match*/282)))
                 (let (*match*/284 =o (field_mut 1 t/275))
                   (if *match*/284
                     (raise
                       (makeblock 0 (global Match_failure/20!)
                         [0: "test1bis.ml" 7 2]))
                     0))))
             0))))

Notice that the exit (2), corresponding to the second submatrix (which follows a split), gets inlined in the code generated by the first submatrix. As a consequence, the second mutable read on field_mut 1 t, which occurs in the second submatrix, ends up within the lexical scope of the first read, which looks a bit silly.

(The redundant load gets eliminated by CSE (Common Subexpression Elimination) later down in the native-compiler pipeline, but the redundant branch is preserved.)

Jun 14 '24 15:06 gasche

Thanks for the explanation. That clarifies to me what's going on with these examples. (Yes, indeed, I should have looked at -drawlambda first.)

I agree with your conclusion about the first three alternatives for re-optimization: either it's not clear if there's a viable approach, or there's a clear downside and it's hard to decide the right tradeoff.

Treating these failures as unspecified behavior?

I'm mostly unmoved by the argument about weaker semantics. Whether or not the failure case is inserted is already sensitive to the pattern match compiler's splitting algorithm. I don't think the user can reliably predict these choices — as you note, splitting handles { x = _; y = None } differently than { x = Some _; y = None } even in contexts where the two patterns seem equivalent because { x = None; _ } was already matched.

In other words, I find the existing semantics to be weak already. (To be clear, I'm not complaining about this.) Even the type-checker is confused (I say somewhat facetiously): it doesn't produce a warning for both_some, despite the fact that the result of the pattern match compiler can raise Match_failure.

Here's a perhaps silly approach: If the final case's pattern can be replaced by a catch-all pattern (i.e. the match is total and the final case binds no variables), then do so before invoking the pattern match compiler. I think it does improve the compilation of both of these cases. It may be too limited in utility to add this special case, but it could be worth considering.

Jun 14 '24 19:06 ncik-roberts

My last suggestion, to replace the last case in a “total” match with a wildcard case, doesn’t work in general because some wildcards can match a value that isn’t matched by the last pattern. (In particular, if the scrutinee is mutated during matching.)

The cases I posted about are tantalizingly close to staying untouched by this PR. But I can’t think of a small variation on the existing approach that would fix them. Such cases seem infrequent &; in a codebase with many millions of lines of code, I only found four affected pattern matches where a Match_failure case is newly inserted.

The apparent infrequency (if you believe my claim) makes me wonder whether a new warning should be issued when pattern matching inserts a Match_failure case but the type-checker insists the match is total. This would let affected users decide that the pattern match should be rewritten to avoid matching on a mutable field.

Do you have an opinion on a warning?

(this would also reduce the concern I posted about earlier, where now a match can raise Match_failure even in cases where the type-checker succeeds without warning and where adding a catchall case would actually raise a warning.)

Jun 21 '24 22:06 ncik-roberts

My last suggestion, to replace the last case in a “total” match with a wildcard case, doesn’t work in general because some wildcards can match a value that isn’t matched by the last pattern. (In particular, if the scrutinee is mutated during matching.)

I don't understand, do you have an example?

My intuition is that if a match is known to be Total, and the current submatrix has a single row, then a pattern that does not bind any variable can indeed be replaced by a wildcard. Instead of weakening Total into Partial under mutable positions we could weaken it into a new Almost_total partiality status, and also wildcardize single-row patterns in this case (which would deviate from the current behavior, but be acceptable under the interpretation that the behavior in case of scrutinee mutation is unspecified).

Do you have an opinion on a warning?

My thinking:

This should be easy to implement, so I don't mind giving it a try.
The warning is not about the (high-level) semantics of the program, but about its performance characteristics. I think that it makes sense if we are convinced that some users will want to catch this pessimization, and try to avoid it, because it brings them performance benefits. Personally I am not really convinced that there would be performance benefits. (There are other ways to compile pattern-matching that result in similar code changes with slight optimizations and pessimizations of the generated control flow, and to my knowledge no one particularly cares about investigating those choices because the performance of pattern-matching is good enough in practice -- I think the current change preserves this property.)
In the very rare cases where expert users know they want an in-depth understanding of a pattern-matching performance, they can already inspect the -dlambda output. (I think we could improve on this slightly by turning the pattern-matching debug output into a new -dmatchcomp option, and possibly thinking about intermediate representations to make the compilation code nicer and improve its observability.)
So in a sense, the warning is more about managing the transition (letting people discover code whose performance profile might have changed, and decide to investigate) than something that will matter in the long term / for newly written code.
I don't think that non-expert users should be exposed to this warning, because it cannot be explained from the source language semantics, you have to teach people about the inner workings of the pattern-matching compiler to explain it.
Therefore, I think that the warning should be disabled by default.

To summarize: I agree that the pessimization will be observed very rarely in practice, and furthermore I think that when it does happen it will not degrade the program performance in practice. My intuition is that the proposed change is thus acceptable as a silent change. But if you want to be sure that performance-paranoid users have an easy way to evaluate the impact of the change on their codebase, I am happy to introduce a warning in this case, disabled by default.

Jun 22 '24 08:06 gasche

Example where my "replace with wildcard" suggestion produces dubious behavior

example below produces a different result if you replace the last pattern (which binds no variables) with _. The result of this rewrite (example_with_wildcard) produces an output, 6, despite the fact that the scrutinee at no point contains C in the mut field of b. This seems clearly undesirable, and I suggest abandoning my "replace with wildcard" suggestion.

type 'a myref = { mutable mut : 'a }
type abc = A | B | C
type t = {a: bool; b: abc myref }

let example () =
  let input = { a = true; b = { mut = A } } in
  match input with
  | {a = false; b = _} -> 1
  | {a = _;     b = { mut = B }} -> 2
  | {a = _;     b = _} when (input.b.mut <- B; false) -> 3
  | {a = true;  b = { mut = A }} -> 4
  | {a = _;     b = _} when (input.b.mut <- A; false) -> 5
  | {a = true;  b = { mut = C }} -> 6
;;

let example_with_wildcard () =
  let input = { a = true; b = { mut = A } } in
  match input with
  | {a = false; b = _} -> 1
  | {a = _;     b = { mut = B }} -> 2
  | {a = _;     b = _} when (input.b.mut <- B; false) -> 3
  | {a = true;  b = { mut = A }} -> 4
  | {a = _;     b = _} when (input.b.mut <- A; false) -> 5
  | _ -> 6
;;

let run f =
  match f () with
  | i -> string_of_int i
  | exception Match_failure _ -> "raised Match_failure"
;;

let () = Printf.printf "example: %s\n" (run example)
let () = Printf.printf "example_with_wildcard: %s\n" (run example_with_wildcard)

It produces the output:

example: raised Match_failure
example_with_wildcard: 6

I also suspect this approach can be unsound if the final variable-less pattern introduces type equations, but I haven't tried to produce an example given that the previous one seems compelling enough.

Clarifying my reason for suggesting for a warning

There are two reasons in my mind for suggesting a warning, but one reason seems much more important to me than the other:

Performance. (As you explain,) the extra checks might be undesirable for users
Semantics. It is surprising for the type-checker's totality checker to accept a program (warning free), and then for the resulting program to be able to raise Match_failure at runtime.

I am more concerned about semantics than performance. Expert users might care about the performance overhead of extra checks, but less expert users will still care about avoiding partial match exceptions.

Does that change how you feel about the warning? I would completely agree with your characterization if the warning was just about performance, but in light of the "semantics" reason, I feel differently about this point:

So in a sense, the warning is more about managing the transition [...] than something that will matter in the long term / for newly written code.

What kind of warning?

I suspect this warning would be mostly straightforward to implement: "Match may be partial if scrutinee is mutated during the match." The criteria: pattern match compilation inserts a Match_failure case in a match the type-checker claimed to be total.

There are at least two problems with such a warning, in my view:

No easy local rewrite: It's bothersome that the user can't disable the warning by adding the needed catchall case themselves. The type-checker will reject it with "redundant pattern". (The inability to add the catchall case is bothersome even without a new warning, but the new warning makes it so users will have to figure out how to work around it when it is raised.)
Noisy: The warning will raise even if the match doesn't involve any of the troublesome cases: lazy patterns, boxing (and thus the possibility of GC finalizers being run), when guards, or multithreaded mutation. The last of these is a property of the surrounding code, not the match itself, and thus not something that can be detected by the pattern match compiler.

These problems make me feel wary of recommending a warning that is enabled by default.

Fundamentally there is a mismatch between the type-checker's assumptions about pattern matching and the pattern match compiler's actual behavior. A warning would just paper over this difference.

The present PR is a clear improvement over the previous unsoundness, so I'm inclined to say that that any work on the warning could be left until after we merge this PR. But I will wait to hear what you think about the warning, in light of the "semantics" reason I explain above (and also the problems I note).

Jun 23 '24 17:06 ncik-roberts

I agree that your dubious example is dubious. Indeed my thinking was that if a clause is taken because it is wildcardized, then we initially started with an input matching this clause before side-effects changed things. Your example demonstrates that this is not necessarily the case. I will add it to the testsuite right away.

Regarding semantics: I don't disagree that suddenly raising Match_failure is nasty. But I would also note that we have let this known soundness bug lie open for around a decade; I think it is because people are convinced that the bug in fact never occurs in the real world. (This perception could have been colored by the difficulty of fixing the bug, which encouraged us to minimize the real-world relevance.) I think that fixing it is the right thing to do, even if it pessimizes slightly a small-but-not-empty set of programs; but the warning has a different cost/benefits analysis.

Our intuition is that the segfault never occurs in practice, and that only a few programs are pessimized: that's infinitely more! Pessimizing very slightly some more programs is fine. But on the other hand, the warning is designed to pester their authors about a Match_failure exception, while we believe that they will never observe it, and we don't really know how they should fix their code.

Fixing the code: In a few cases authors could tweak their code with guidance from a pattern-matching-compilation expert. Otherwise they can decide to catch Match_failure around the match, or just decide that they audited the code and that they don't expect the issue to affect them; and then silence the warning.

My preference goes to adding the warning, disabled by default. The main value of the warning in my opinion is to let people that would worry about the compilation change understand what parts of their codebase are affected -- without the warning, it is fairly hard to do this analysis, you need to use a forked compiler over your codebase.

Jun 23 '24 19:06 gasche

Let me write down an idea I had in passing: instead of raising Match_failure, we could jump back to the beginning of the pattern-matching logic. This approach does not introduce more failures, and it is arguably a better behavior in all cases other than when guards (lazy thunks, concurrent races, asynchronous callbacks): the pattern-logic restarts and then has the behavior expected from the source-level semantics. But when guards break the idea, I think, as (1) it could result in effectful when-guards being executed several times, and worse (2) a well-chosen when-guard that makes the value flicker may produce in an infinite loop of restarts, another form of partiality.

Jun 23 '24 20:06 gasche

Adding the warning, disabled by default, sounds good to me.

I see your argument that the performance change is what motivates the warning. I want to lodge one last argument in favor of mentioning the presence of the extra Match_failure case in the warning text: I know some OCaml users that refuse to use when-guards because they know about the possible unsoundness. So, even if the unsoundness is considered unlikely to happen in practice, it's still something that can factor into users' choices. I expect these users will want to enable this warning in perpetuity, as it will predictably flag when pattern matching can result in a behavior not listed in any of its cases (the Match_failure exception).

Jun 23 '24 21:06 ncik-roberts

I started working on a warning, but this is proving more interesting than planned -- I can't get it to fire on a testsuite example which should, and it fires in the stdlib in places where it should not.

Jun 23 '24 22:06 gasche

Hi @gasche, there is something that puzzles me, for instance with Nick's first example:

type 'a t =
  { mutable x : 'a
  ; mutable y: 'a
  }

let both_some t =
  match t with
  | { x = Some x; y = Some y } -> Some { x; y }
  | { x = None; y = _ } | { x = _; y = None } -> None

Is there any reason for the mutable fields to be mutated? As there is no guard nor lazy pattern, do we assume that the mutable fields can be mutated by a concurrent thread, or something else?

Jun 24 '24 16:06 maranget

@maranget yes, in this case mutations would come from concurrent mutations. My general approach is to assume that any mutable field may be mutated "during" matching, so that two reads on the same field may always yield different results.

Jun 24 '24 19:06 gasche

I rebased the present PR on top of #13338, which fixes a code-quality regression introduced in #13076.

Jul 30 '24 08:07 gasche

I implement and document a warning as discussed in #13341. That PR is on top of #13153, which removes the check_partial machinery (an older, incomplete heuristic to degrade partiality information, that is subsumed by the present PR).

The reason for being on top of #13153 rather than on top of the current PR directly is that the natural/simple implementation of the warning is after the place where the check_partial machinery is currently called, so the check_partial can (wrongly) silence the warning in some cases. In other words, the codebase is cleaner after #13153 and it is (slightly) simpler to implement the warning correctly.

Jul 30 '24 09:07 gasche

I am happy with this PR after doing some more testing with #13338. I'm interested to see it rebased over https://github.com/ocaml/ocaml/pull/13338/commits/4dd3c6a2af53d4dfc5cfcb561d126ba8e02934bb before I click "approve", but I don't have major comments at this point.

Jul 31 '24 15:07 ncik-roberts

I just rebased this PR on top of the (new) second half of #13338, and also there was a small conflict with #13342 (a nice-to-have conflict, the new version is shorter and simpler).

Jul 31 '24 17:07 gasche

Thanks! Merging.

Aug 22 '24 13:08 gasche