gleam icon indicating copy to clipboard operation
gleam copied to clipboard

Full exhaustiveness checking of assignments and case expressions

Open lpil opened this issue 7 years ago • 22 comments

edit: See comments below for current status.

lpil avatar Oct 11 '18 22:10 lpil

I was lucky enough to see Simon Peyton Jones talk about how this is now implemented in the Haskell compiler at Code Mesh LDN 2019 (talk title: REVISITING PATTERN MATCH OVERLAP CHECKS IN HASKELL).

The approach sounded good (and SPJ recommended that I use it when I asked) but the paper has not been released yet. Either work off the slides or see if the paper is out when we get around to implementing this.

I believe the same talk was given (possibly in more detail) at the Haskell eXchange 2019.

lpil avatar Nov 07 '19 15:11 lpil

Can also see the OCaml compilers code for it, it's quite clean. :-)

OvermindDL1 avatar Nov 07 '19 16:11 OvermindDL1

Here is P. Sestoft: ML pattern match compilation and partial evaluation: http://dotat.at/tmp/match.pdf

michallepicki avatar Dec 08 '19 21:12 michallepicki

Paper mentioned in SPJ's talk: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/08/gadtpm-acm.pdf edit: that was the old one, here is the new approach he recommends: https://www.microsoft.com/en-us/research/publication/lower-your-guards-a-compositional-pattern-match-coverage-checker/

michallepicki avatar Dec 08 '19 21:12 michallepicki

SPJ's talk got published: https://www.youtube.com/watch?v=SWO5OzSxD6Y

michallepicki avatar Dec 20 '19 13:12 michallepicki

From the list of papers I guess this is not a good issue to start with. Is there anything I can help with. is the SPJ approach the one that is accepted as the "right" approach for this issue?

CrowdHailer avatar Apr 21 '20 08:04 CrowdHailer

is the SPJ approach the one that is accepted as the "right" approach for this issue?

I didn't know how to evalute the options so I cornered SPJ at a conference and he told me I should use that one. Good enough for me!

lpil avatar Apr 21 '20 08:04 lpil

I've been reading the SPJ paper tonight, and I think there's a few simplifications that apply to Gleam.

  1. Since Gleam is strict, we don't need to worry about handling divergent sets or strictness constraints at all (I'm not positive on this and need to do a deeper pass to be sure)
  2. Since Gleam doesn't support GADTs, we don't need to track type equalities (in rules CCONVAR and UCONVAR, see 4.3)
  3. The satisfiability oracle can ignore all constraints and the algorithm remain sound (section 6.1)

By ignoring (3) to start, we lose exhaustiveness checking of guards, but it becomes possible to add support later without changing the core algorithm. This lets us avoid also having to write a type-constraint solver for the initial version. We could even leave out the CGUARD and UGUARD rules entirely to start, and add those later.

QuinnWilton avatar May 05 '20 07:05 QuinnWilton

Also of note, the paper Elm used (Maranget) is mentioned under Related Work:

image

It sounds similar to SPJ's algorithm, but without support for GADTs, laziness, guards, or nested patterns. I'm no expert on this stuff, but SPJ's algorithm sounds like a natural evolution of things, even if only some of the new features are relevant to Gleam.

QuinnWilton avatar May 05 '20 07:05 QuinnWilton

Amazing work!

RE GADTs: I would be interested in possibly having them at a later date so if we can be open to that extension then that would be excellent.

lpil avatar May 05 '20 07:05 lpil

I think that it would just be a case of whether we track the type equalities in those two rules or not. I'm on my phone now and headed to sleep in a moment, so I can't grab a picture, but figure 2 (maybe it was 3) summarizes the algorithm, and as far as I could tell, support for GADTs comes from calculating Δ' in CCONVAR and UCONVAR.

Assuming we represented the type constraints as an enum, it'd just be a case of adding type equalities to that enum + tracking them there later.

I think the broad algorithm can be kept intact, and we just not track the things we don't care about yet.

I'm also not an expert on this stuff, so it'd be really helpful to double check with someone who is to make sure I'm not completely misunderstanding the algorithm :)

QuinnWilton avatar May 05 '20 07:05 QuinnWilton

I've been doing a lot of research on this the past few days, and the 2015 GADT paper is considered by SPJ to be slow, brittle, and hard to understand.

His Code Mesh LDN talk describes a simpler approach at a high level (and is an excellent and very approachable talk).

I haven't read it in its entirety yet, but that talk seems to be formalized in a draft paper from March 2020, "Lower Your Guards": https://www.microsoft.com/en-us/research/publication/lower-your-guards-a-compositional-pattern-match-coverage-checker/

I'm not promising anything concrete yet, but I'm going to take a stab at implementing a reduced subset of this paper against a simple language in Elixir, to test its applicability to Gleam.

QuinnWilton avatar May 06 '20 20:05 QuinnWilton

Amazing investigative work @QuinnWilton ! Thank you!

lpil avatar May 06 '20 20:05 lpil

I don't know if this is interesting but I have been going through my error logs on sentry and cleaning up all the asserts and incomplete cases I put in due to being lazy.

If there was a warning to catch.

  1. usage of let with a type that has more than one varient. i.e. let Ok(x) = my_func
  2. There was a warning for every case statement that didn't have a branch for each varient

Those warnings would have caught 38 out of 39 changes I've made. The last one was an assert I had done against a list not being empty.

So I guess what I'm saying it if those kinds of warnings are much easier than exhausivte ness checking I would have got 95% of the benifit

CrowdHailer avatar Feb 22 '21 11:02 CrowdHailer

Perhaps there is a half-way solution that only checks the top level that could be beneficial until we have the real solution.

lpil avatar Feb 22 '21 11:02 lpil

Reference shared by @pd-andy

If it helps here's the paper I used to implement pattern matching with exhaustiveness and redundancy checking: https://www.cs.tufts.edu/~nr/cs257/archive/luc-maranget/jun08.pdf And here's my implementation of it: https://github.com/jfecher/ante/blob/master/src/types/pattern.rs

This looks like a very good candidate as it has a paper, code, and it produces an optimal decision tree for use in code generation

lpil avatar Aug 11 '21 13:08 lpil

I tried to hack on the "half-way" solution mentioned earlier, unfortunately Rust defeated me for now, but I believe this is a good place to start. If kind: AssignmentKind is Let, and pattern is a Constructor, check that value_typ is a record with only one constructor. Otherwise print a nice error.

Then once this code works it can be refactored to the more general case somewhere around here

michallepicki avatar Jan 09 '22 09:01 michallepicki

I recently implemented two algorithms in https://gitlab.com/yorickpeterse/pattern-matching-in-rust/, as part of adding pattern matching and exhaustiveness checking to Inko. Since both implementations are in Rust they might be of interest to Gleam :smiley:

If Gleam's pattern matching is similar to ML and Rust, adopting it shouldn't prove too difficult. I have not looked into supporting GADTs though, so if Gleam supports (or wants to support those) more work may be needed.

yorickpeterse avatar Jun 01 '22 21:06 yorickpeterse

Thank you so much @YorickPeterse ! Do you have any recommendations as to which one of the two you might use? jacobs2021 seems good!

lpil avatar Jun 07 '22 12:06 lpil

@lpil I'd probably go with the jacobs2021 implementation. It's a bit simpler compared to the sestoft1996 implementation, in part because it doesn't rely on modifying lists all over the place. It also produces better decision trees, as the sestoft1996 algorithm makes no attempt at reducing the size of the resulting decision tree.

Setting that aside, the jacobs2021 implementation in the linked repository is a bit more extensive, and has support for integer literals, ranges, OR patterns, and guards. This should make porting it to e.g. Gleam a bit easier, as there's less time/effort needed to figure out how to support those types of patterns.

yorickpeterse avatar Jun 09 '22 04:06 yorickpeterse

You champ, thank you so much. You've saved me an awful lot of time by doing and sharing all this work

lpil avatar Jun 09 '22 08:06 lpil

@lpil Glad to hear it's useful and saving people time, as pattern matching is indeed a tricky and time consuming subject :smiley:

yorickpeterse avatar Jun 09 '22 08:06 yorickpeterse

@lpil Out of curiosity and seeing the algorithm is based on my repository here, how useful did that code turn out to be? Or to put it differently, was there anything I could've done better to explain things? :smiley:

yorickpeterse avatar Nov 27 '23 14:11 yorickpeterse

It was extremely useful @yorickpeterse ! Thank you. It's largely your code that I've hit with a hammer until it more-or-less fits with the Gleam compiler. There's a lot of things I'd like to refactor (mostly stemming from Gleam being a project I learnt Rust with) but that can come later. Or perhaps not at all!

lpil avatar Nov 27 '23 15:11 lpil

@lpil Glad to hear it was helpful! :smiley:

yorickpeterse avatar Nov 27 '23 23:11 yorickpeterse