batteries feat: DiscrTree

I've made a new version of Lean's DiscrTree called RefinedDiscrTree with a lot of new features. It is able to index lambda's, forall's, and their bound variables. It also gives library metavariables an index so that it can keep track of which metavariables are the same. It also gives a score to each match pattern, so that you can get the results in order of how relevant they are.

The file uses the StateListT monad transformer (which does not satisfy the associativity monad law) that I made, so I also added its definition in another file. I admit that I have little idea how to use the @[inline] and @[always_inline] tags, so if those can be improved, please do so.

The main purpose is for use is to get better results in library searches, for tactics like exact? and apply?, and I want to publish a library rewrite tactic that uses this discrimination tree.

Nov 26 '23 12:11 JovanGerb

I think this works better under Std.Lean since it's a Lean-specific and not a general purpose data structure. Also, it seems to me that DiscrTree.Init.Basic could be merged with DiscrTree.Basic.

Note that all defs need a docstring.

Since Data.ListT is not used in the other files, it should not be in this PR.

Nov 26 '23 17:11 fgdorais

I tried an implementation with MLList, where I manually made a new version of withLocalDeclD that works with MLList, but building the DiscrTree then takes 4x longer, so that doesn't work unfortunately.

Nov 27 '23 15:11 JovanGerb

Thanks for the update! It's now clean enough to do a full review. I'm somewhat busy this week but I will get there and others might chime in too.

Please mark review items as completed as you go so it's easier to keep track of what's done.

You can also change the status of this PR as described here: https://github.com/leanprover/std4/pull/232 The awaiting-review status would be best at this time.

Nov 27 '23 21:11 fgdorais

awaiting-review

Nov 27 '23 21:11 JovanGerb

Apparently, CI checks will not run automatically for you until you get one merged PR. This is a big PR so if you have an idea of an easy PR that would be useful and merged quickly, that would fix that problem...

Nov 27 '23 21:11 fgdorais

I tried an implementation with MLList, where I manually made a new version of withLocalDeclD that works with MLList, but building the DiscrTree then takes 4x longer, so that doesn't work unfortunately.

It's interesting that MLList was so much slower. One possible explanation is that the current compiler doesn't know about Thunk. I think it would be useful to profile (ping @digama0).

Nov 27 '23 22:11 fgdorais

I think this is an interesting structure, but there needs to be a quite high bar before we have two DiscrTree implementations in Lean and Std. That's twice the potential for bugs and we would need to have clear guidance on why one should be preferred.

I think it may be better to demonstrate this works better on a tactic prior to committing to Std, assess whether the improvements could be made to the existing DiscrTree, and if not, look into upstreaming this with warnings that it is internal to the tactic for now.

Nov 27 '23 22:11 joehendrix

In terms of which one is preferred, I think there is no real reason to prefer the old implementation, other than that porting to a new version might take a lot of effort. The old version indexes less well, so I suppose that storage of and lookup in the old DiscrTree could be slightly more efficient. But I think any new tactic should use this version.

Nov 27 '23 22:11 JovanGerb

I heard that the exact? tactic will soon be moving to Std, and it would be great to get that running with the new DiscrTree. Our next goal is to get a point & click rewrite? into Mathlib. We already have this working, but not in a form that is ready for Mathlib.

Nov 27 '23 22:11 JovanGerb

I just ran a test comparing building the old and new version of DiscrTree, using heartbeats. The new version takes 1.27 times as many heartbeats compared to the old version. This calculation will be done by CI, so I think this performance difference is fine.

Nov 27 '23 23:11 JovanGerb

I added an argument (unify : Bool) to the unification/lookup function. By setting this to false, you can instruct the lookup algorithm to treat metavariables as free variables. The old DiscrTree implementation also has this distinction (although in two separate functions, and not with a Bool argument). For example a tactic like simp doesn't want to instantiate metavariables.

Nov 28 '23 12:11 JovanGerb

Why do you think the performance overhead is fine? Isn't DiscrTree used as a filter?

Are there any performance benchmarks for exact?? I am looking into merging that in Std and performance benchmarks would be great to have.

Nov 28 '23 16:11 joehendrix

My understanding is that in exact?, a relatively small amount of the computation time goes to searching in the DiscrTree, and most of the computation time is used for checking whether the results do indeed apply. If this hypothesis is true, then using an improved version of DiscrTree will make the tactic more efficient. But of course that has to be tested.

Nov 28 '23 16:11 JovanGerb

I should say that I've only compared the runtimes for building the DiscrTree, and not lookup, since that is the part that takes a lot of time. But when running a tactic like exact? you are only looking up in the DiscrTree. So this still needs to be tested.

Nov 28 '23 17:11 JovanGerb

For testing performance, I use Lean's heartbeats. Is that the generally accepted way of performance benchmarking?

Nov 28 '23 17:11 JovanGerb

Heartbeats count the number of small allocations, which is the same on any platform, but it isn't an accurate measure of time.

Nov 28 '23 17:11 fgdorais

I replaced the Key.fvar key with Key.opaque, which I think is a better name for it. This is used for indexing existentially quantified variables in the library so it is in a sense the opposite of Key.star. I added a new Key.fvar in the way it exists in the old DiscrTree, but I don't know of any use case for it. All FVarIds only exist locally, so if DiscrTree is used to search some library then it is useless to remember FVarIds. So this is only useful for locally built DiscrTrees.

Nov 28 '23 19:11 JovanGerb

It's a little complicated that we're moving exact? to Std at the same time as this is being developed --- if exact? was staying in Mathlib longer I would say that it would make sense to develop this new DiscrTree implementation there first.

But that's not the case, and this implementation offers some very tempting improvements! I am enthusiastic about this implementation, but we have to have a plan for the maintenance burden of a whole parallel implementation.

Here are some necessary-but-not-sufficient things I'd like to see before merging this to Std:

a parallel branch that actually uses this in exact? (i.e. this would have to be branch containing both this and #343)
a test that demonstrates this solves the problem finding HasDerivAt.add which I think is quite representative for the improvement this DiscrTree would bring to exact?
timing data (e.g.) using hyperfine for MathlibExtras/LibrarySearch.lean with both versions of DiscrTree, holding everything else constant
similarly, timing data for Mathlib's test/LibrarySearch/ directory

Nov 29 '23 01:11 kim-em

I think it may be better to demonstrate this works better on a tactic prior to committing to Std

I would like to emphasize this point Joe and Scott made and go even further: clear and significant impact should be shown for every single extension separately. I had a brief look at the extensions and it's not clear to me whether sufficient thought has gone into each of them:

.star now possibly leading to unification checks during matching seems to go against the core motivation behind the discr tree of avoiding unification. I also don't see how differences in local contexts are handled correctly.
I don't understand what is special about existentials compared to other binders
I'm less familiar with the subtleties of matching under binders so I can't comment on the implementation there but I expected some discussion of them in the documentation
I'm really not sure the discr tree is the correct place to score matches given that, again, its motivation is to be a mostly transparent optimization/pre-filter of unification
Why eta reduction but no eta expansion? It seems arbitrary at a glance.

In short, I fear this is changing/adding too many things at once for the high bar of polish and involved thought we expect from such data structures in core and std4. In this vein I would also like to remind people of Leo's guidance in the original thread. @_Leonardo de Moura|112857 said:

You’re absolutely right that higher-order matching is a complex challenge. Currently, the DiscrTree is not designed to handle this complexity. However, I do appreciate the potential benefits such changes could bring to certain projects. While I believe that these types of workarounds described above might not be the best fit for core or the standard library, it's great to explore these ideas, and include them in other packages.

Nov 29 '23 14:11 Kha

@Kha Thanks for the comments. I'll reply to them in order

I agree that calling isDefEq is possibly problematic. It seemed to me that in practice the isDefEq calls are quite simple, but a better alternative would be to use a simpler matching algorithm than isDefEq which behaves more like how the rest of the matching goes. So something like turning both expressions into an Array Key and checking that they are equal. And difference in local context could indeed be a problem, which would then also be solved with this.
Existential binders are special in the same way how universal binders are special, but instead of giving a variable that can match anything, it gives a variable that can match nothing. So you would use it to instantiate metavariables. An example where I have used this feature in my program is with the goal ∃ p : ℕ, n ≤ p ∧ Nat.Prime p, doing library search on Nat.Prime p, finding Nat.exists_prime_and_dvd, which after applying turned my goal into ∃ n_1 : ℕ, n_1 ≠ 1 ∧ ∀ p : ℕ, p ∣ n_1 → Nat.Prime p → n ≤ p. It seems like this feature is not necessary in most cases, but I think it does have some potential for library search tactics.
This documentation does indeed seem to be missing still.
I use the DiscrTree in an interactive library search program that shows the search results to the user, after filtering using isDefEq. Sorting them by score means that the result are a lot better. In cases where a lot of results come out of a discr tree and you have limited time to check them, it is very useful to know which results to check first.
I can think of two ways to solve the eta reduction problem: either do all possible eta reductions on lemmas when you put them into the DiscrTree or do all possible eta expansions when looking up an expression in the DiscrTree. I went for the first approach as this doesn't add any complications to the lookup algorithm, and it seems to work well. But I don't see any reason why the other approach wouldn't work.

It makes sense that there is a high bar of polish and involved thought and that more evidence is required that the new features are worth it. We are is the process of making a point & click library search tactic that will probably be PR'ed to Mathlib in the next couple of weeks. When this is finished, it will be a lot easier to see how the DiscrTree performs as you can directly play with it by applying it to any expression in a tactic state.

Nov 29 '23 17:11 JovanGerb

How should I go about unit testing/benchmarking? For StateListT I previously just replaced it in the code by an alternative to see the change, but that isn't exactly a unit test. Also should I use set_option profiler true for timing?

Dec 12 '23 08:12 JovanGerb

In short, I fear this is changing/adding too many things at once

Can we work out a way to do this much more incrementally, with essentially a separate PR for each behaviour change relative to the core DiscrTree?

Dec 12 '23 23:12 kim-em

That is also possible. I don't know if making changes there can break lot's of other things though, because class inference and simp calls could get slightly different results after a change. (Even if the new results are better). Which change would you suggest doing first?

(Oh sorry, I read in the core DiscrTree instead of relative to the core DiscrTree)

Dec 13 '23 08:12 JovanGerb

I think the profiler is a good tool to use for performance analysis.

For benchmarks, I think it's context specific. Since we are talking about first using this in the context of a tactic, I'd recommend testing your tactic with both this discrimination and the Lean core one. That's what I am doing for the lazy discrimination tree in the context of the exact? tactic.

That's assuming the tree is associated with a tactic. If we use more general documentation and naming suggesting this could replace the core discrimination tree, then I'd think we'd need testing that the higher-order discrimination tree outperforms the existing one with simp. That would require forking core and significantly more work.

Dec 13 '23 15:12 joehendrix

What is the state of this PR?

I would like to move fprop tactic from SciLean to mathlib and this refined discr tree would be really useful. I can't use normal DiscrTree as it can't index lambda functions properly.

Jan 10 '24 13:01 lecopivo

What is the state of this PR?

I don't think the PR will merge very soon, since we need to first have a tactic that uses this DiscrTree and this depends on yet another PR into std, "Pattern-based unfolding tactic".

However, the code is very much useable, so if you want to use it in mathlib somewhat soon, there should be a way to do this. Maybe I should move this PR over to mathlib, or you can temporarily just copy the code. Maybe that's the best option for now so that things don't accidentally break on your side if I make more changes to the DiscrTree at some point.

Jan 10 '24 14:01 JovanGerb

Ok, I will just copy it for now and let's see if it work as I hope for the needs of fprop. And if it does it might be a good motivation to move this PR forward

Jan 10 '24 14:01 lecopivo

batteries batteries copied to clipboard

feat: DiscrTree

batteries
batteries copied to clipboard