catala icon indicating copy to clipboard operation
catala copied to clipboard

Encode default terms as pattern matching

Open adelaett opened this issue 1 year ago • 3 comments

Overview

This proposal suggests a design enhancement for the default calculus, which could improve the compilation process and increase the language's expressiveness.

Motivation

The idea comes from a simple observation: the default term <e1, ... en | ej :- ec> can be viewed as a form of pattern matching: <e1, ... en | ej is pj :- ec>. If ej represents not a boolean condition but a pattern, <| ej is pat :- ec>, we could encode traditional pattern matching as follows:

< <| ej is pat1 :- ec1>, <| ej is pat2 :- ec2> | Nothing :- empty>

This structure would allow variables to be bound within the consequence of a default term. For example:

match x with 
| Some y when y >= 0 => y + 1
| Some y when y < 0 => y - 1
| None => 0
end

can be encoded as

< <| x is Some y when y >= 0 :- <y + 1>>,
  <| x is Some y when y < 0 :- <y - 1>>,
  <| x is None :- <0>>
  | Nothing :- Empty
>

Note that the Nothing is different from the existing false in the boolean language, and of patterns. It is more like an empty pattern

High-Level Syntax

In terms of surface syntax, this change would mean we could add under condition x is Some y, making the variable y accessible in the current context. Here is a more concrete example:

definition z
under condition x is Some y when y >= 0
consequence equals y+1

definition z
under condition x is Some y when y < 0
consequence equals y

definition z
under condition x is None
consequence equals y

Compilation Approach

For the compilation process, we could integrate match ... with structures with default terms, utilizing pattern-matching compilation techniques to optimize the evaluation of these terms such as [maranget-ml2008]. (with modifications related to conflict and nothing patterns)

In particular, to check there is no conflict between too branches, one need only to check the branches where the pattern is the same. This could reduce the number of cases to handle.

@article{maranget-ml2008,
        author =    "Luc Maranget",
        title =     "Compiling Pattern Matching to Good Decision Trees",
        booktitle = "Workshop on the Language ML",
        year =      2008,
        month =     September,
        publisher = "ACM Press"
}

adelaett avatar Nov 03 '24 09:11 adelaett

Hm, I am not yet totally convinced of the point, but will be happy to discuss it :)

In particular, doesn't the presence of when clauses defeat most of the pattern-matching optimisations ? (I need to re-read Luc's paper!)

Two remarks:

  • x is Some y when y >= 0 exists with the syntax x with pattern Some of y and y >= 0 ; it does not, however, bind the variable within the definition body when used in an under condition clause, which would be extremely useful (I thought there was an issue already open at this subject, but could not find it)
  • I have recently found quite a few cases where it would be useful to have, inside the AST, a expr MATCHES constructor predicate ; for now it has to be encoded as a pettern-matching listing all the cases, there is no other way to deconstruct an enumeration term. An example is the temporary equality check expansion pass that I've written, which due to this limitation of the AST requires two nested full-matches — quadratic in size with the number of variants. (Of course, for this case, allowing real pattern matching on tuples would be an even better solution)

AltGr avatar Nov 04 '24 10:11 AltGr

The problem of the expr matches constructor is the same as x with pattern Some of y for the condition; You cannot access it inside the definition, hence you need to re-match the same expression.

For the opitmization on pattern matching, i talked with @Nadrieril about pattern matching with for catala, and we arrived at the conclusion that it is an non-trivial interesting problem, especially if we want to compile to ocaml pattern matching among other things.

We mainly focused on two points: the first one on how to express the conflict condition using only pattern matching. It should be possible to do it using either exception-counting with decision trees, with two runs, or using back-tracking automata that backtrack the first found exception.

However it is non-trivial to adapt those to nested catala expressions, as it is not obvious to find the correct ways to flatten back the pattern matching.

Nevertheless, @Nadrieril indicated that it should be possible to encode pattern matching with a lot of conditions if the conditions have a certain form. For example if they are intervals on a total order, it is possible to have a logarithmic decision tree. This could be interesting for compiling more efficiently catala.

--

In the meantime, modifying the compilation of this new version of default terms is not very hard. Instead of writing

let res =
    List.fold (function Empty, v | v, Empty => v | v, v => Conflict) Empty [e1, ..., en]
in
if ej then ec else Empty

We can write:

let res =
    List.fold (function Empty, v | v, Empty => v | v, v => Conflict) Empty [e1, ..., en]
in
match ej with |pj => ec | _ => Empty

This is minor modification given regular pattern matching are already part of the lcalc intermediate representation.

adelaett avatar Nov 04 '24 15:11 adelaett

One major difference that I can see is that pattern-matching is supposed to stop processing at the first match; here, unless we assume there will be no conflicts, we will need to process all of them anyway. Another is that during a pattern-matching, you can leverage the structure of the matched term to drive the decision tree, and here there isn't in general much structure to rely on ?

AltGr avatar Nov 06 '24 09:11 AltGr

Adding here some thoughts that have come since this was initially discussed, and hadn't been written yet ; in particular, in user-facing concerns.

When an input of a scope is of a variant type, you sometimes write things such as:

definition bar under condition foo with pattern InterestingCase content payload
consequence equals ...

Currently, there is no way to extract payload from the condition and into the ... expression: you have to match again and write

definition bar under condition foo with pattern InterestingCase
consequence equals
  match bar with pattern
  -- InterestingCase of payload: ...
  -- anything: impossible

The encoding described in this PR would allow the condition to bind the payload directly into the result expression and the shorter form to be used.

AltGr avatar Dec 23 '25 15:12 AltGr