Rename {find,canonicalize} to unsafe{Find,Canonicalize}
Hello, and thanks for the good work.
This is a design proposal
Both find and canonicalize, as defined, are abstraction breaking, because the data structure is not in a sound state if you don't call rebuild previously. My proposal, in keeping with the spirit of Haskell of correctness first is that:
- the functions named
findandcanonicalizeshould callrebuildbefore querying the data structure (note: in the case where the data structure is already rebuilt, then the work list is empty, thereforerebuildwill be very fast anyway) - for the sake of performance tuning, direct queries to the data structure, called
unsafeFindandunsafeCanonicalizewould be introduced. With the documentation explaining that the onus is on the programmer to only call them on a “rebuilt” E-graph.
Thank you!
I haven't given much thought to how one might end up using find and canonicalize in practice.
One thing that doesn't feel right with your proposal is that merge is also unsafe and rebuild is exported.
The Data.Equality.Graph module allows one to create, modify, and choose when to rebuild e-graphs. So a consumer of that module (such as Data.Equality.Saturation) understands the e-graph is only guaranteed to have its invariants maintained after rebuild.
Perhaps for more correctness we could have this be enforced with an ST-like monadic interface, in which the analogue for runST would be rebuild 🙂 -- meaning functions like merge could only be run inside it, rebuild wouldn't exist, and we wouldn't need any unsafe, just a difference between the functions that can be run inside the invariant breaking computation and the others. I don't know at what point we're complicating it beyond need, but we could also just have this in additional modules.
So going back to find and canonicalize: I wouldn't call them "unsafe" (but do convince me otherwise). Both will work correctly when called on any e-graph, that is, they will find the representative in the e-graph. The contract of having a library in which you can choose when to rebuild the invariants is to understand that until rebuilding it, the invariants aren't maintained. Meaning that if you called merge on the e-graph a couple of times, and then try to find an id, you'll find the current representative for it -- which is not necessarily the same as if you had called it after rebuilt.
One thing that doesn't feel right with your proposal is that
mergeis also unsafe andrebuildis exported.
So going back to find and canonicalize: I wouldn't call them "unsafe" (but do convince me otherwise). Both will work correctly when called on any e-graph, that is, they will find the representative in the e-graph.
In Haskell tradition “unsafe” means: when calling this function, there is a proof obligation that the type system can't discharge, and the programmer will have to prove themself. This is not the case of merge: it's always safe to call merge. But it's the case of the current find and canonicalize, which only make sense on rebuilt e-graphs.
You are arguing otherwise, but I believe the “right” abstraction is to think of the egg data structure as a lazy e-graph (where the laziness is embodied by the actions deferred to the worklist). When you call find or canonicalize you force the data structure. You also have a rebuild, which is kind of like seq in Haskell: something that doesn't change the semantics (divergence notwithstanding), but can be used for performance reason.
Perhaps for more correctness we could have this be enforced with an
ST-like monadic interface, in which the analogue forrunSTwould berebuildslightly_smiling_face
It is likely that you could so something like this. Also something with linear types ( :blush: ). Both sound more painful than they are worth (I mean, I certainly intend to push linear types until it's not unpleasant to do this sort of abstraction, but I wouldn't advise doing so today unless it has a lot of value). But honestly, I think that the simple way outlined in the original issue is really fine.
I quite like the lazy e-graph framing.
Under that light I must say it does make sense.
I agree we can then have an unsafeFind and unsafeCanonicalize that doesn't force the e-graph to be rebuilt!
I also do agree that both the monadic and linear types thing would be too much here, and that the proposed solution under that light is good.
I'll leave the issue open and close it when we change the interface in a MR.
Re: linear types: I'm a fan of linear types and of your work on Linear Haskell, and wanted to mention I wrote an undergraduate thesis on synthesis from linear types and more recently wrote a GHC plugin implementing the synthesis using GHC's 9.0 linear types :) (which I got to show to Mathieu at ZuriHac!)
(This is getting very off topic, but this is very neat! I'd love to see you demonstrate it to me some time)