lean4 perf: rewrite UnusedVariables lint

This is a rewrite of the UnusedVariables lint to inline and simplify many of the dependent functions to try to improve the performance of this lint, which quite often shows up in perf reports.

The mvar assignment scanning is one of the most expensive parts of the process, so we do two things to improve this:
- Lazily perform the scan only if we need it
- Use an object-pointer hashmap to ensure that we don't have quadratic behavior when there are many mvar assignments with slight differences.
The dependency on Lean.Server is removed, meaning we don't need to do the LSP conversion stuff anymore. The main logic of reference finding is inlined.
We take fvarAliases into account, and union together fvars which are aliases of a base fvar. (It would be great if we had UnionFind here.)

More docs will be added once we confirm an actual perf improvement.

Jan 15 '24 17:01 digama0

!bench

Jan 15 '24 17:01 digama0

Mathlib CI status (docs):

❌ Mathlib branch lean-pr-testing-3186 built against this PR, but testing failed. (2024-01-15 18:58:37) View Log
❌ Mathlib branch lean-pr-testing-3186 built against this PR, but testing failed. (2024-01-16 18:51:01) View Log
❌ Mathlib branch lean-pr-testing-3186 built against this PR, but testing failed. (2024-01-17 09:51:55) View Log
❌ Mathlib branch lean-pr-testing-3186 built against this PR, but testing failed. (2024-01-17 10:09:45) View Log
❌ Mathlib branch lean-pr-testing-3186 built against this PR, but testing failed. (2024-01-17 13:05:45) View Log
❗ Std/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 7d5b6cf097c435a5e45e4549115945f4f577fc56 --onto c5fd88f5e118738425dd55b7592de435b7db8483. (2024-02-27 11:42:02)
❗ Std/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 846300038f0e692fe08ea83d410f96c233624489 --onto 2d18eff544093882d45b7e1e97a0876247324f2b. (2024-03-19 10:06:58)

Jan 15 '24 18:01 leanprover-community-mathlib4-bot

The new warnings produced in Std and Mathlib are false positives, e.g.

./.lake/packages/std/././Std/Data/Ord.lean:87:25: warning: unused variable `ord` [linter.unusedVariables]

is not unused.

Jan 16 '24 01:01 kim-em

I'm going to try to extract some of those as additional test cases, since the lean4 test suite passed.

Jan 16 '24 01:01 digama0

The new warnings produced in Std and Mathlib are false positives, e.g.
./.lake/packages/std/././Std/Data/Ord.lean:87:25: warning: unused variable `ord` [linter.unusedVariables]
is not unused.

@[inline] def compareOn [ord : Ord β] (f : α → β) (x y : α) : Ordering :=
  compare (f x) (f y)

Are you sure? It looks pretty unused to me. Is there an IgnoreFunction which makes this an exception?

It looks like a bug in the original code relating to autoImplicits, since

@[inline] def compareOn {α β} [ord : Ord β] (f : α → β) (x y : α) : Ordering :=
  compare (f x) (f y)

does trigger the unused variables warning on ord. (I think there is a reasonable argument that this should be exempted because it is a variable in the function signature, but this is independent of the baseline check.)

Jan 16 '24 16:01 digama0

I reviewed all of the new linter warnings, and all of them appear to be correct (to spec; I know some people have doubts about this algorithm but it's doing what it is designed to do). Do I need to make a fix PR before it is possible to !bench this?

Jan 16 '24 17:01 digama0

Mathlib's lean-pr-testing-3186 branch needs to be made into a PR if you want to run !bench on Mathlib.

I'm not sure why the bot didn't respond to your !bench request above. Let me try it again.

Jan 16 '24 22:01 kim-em

!bench

Jan 16 '24 22:01 kim-em

Here are the benchmark results for commit c7f6b4275d744f880306b812c2fa902f1c2896f0. There were significant changes against commit 12dc171c4862df0177de4adec56a8f739359719b:

  Benchmark   Metric          Change
  ===========================================
- stdlib      type checking     2.3% (11.1 σ)

Jan 16 '24 22:01 leanprover-bot

I'm not sure why the bot didn't respond to your !bench request above. Let me try it again.

I'm pretty sure it's because I don't have permission to run the bot (the list of approved users is separate from org members, and I think only you and Sebastian and Leo can run it, and possibly other FRO members as well).

Jan 17 '24 09:01 digama0

Note that two tests in the Lean 4 test suite accidentally test false positives. See the linterUnusedVariables.lean diff at https://github.com/leanprover/lean4/pull/3082/files#diff-a33d4a408eca48eae24f881090327a42577735140184b3485ec87f08af36a8cc

Jan 17 '24 09:01 mhuisi

Mathlib's lean-pr-testing-3186 branch needs to be made into a PR if you want to run !bench on Mathlib.

https://github.com/leanprover-community/mathlib4/pull/9803, are you saying that I should run !bench in the comments of that PR or that running it here will work now that the PR exists?

Jan 17 '24 09:01 digama0

Note that two tests in the Lean 4 test suite accidentally test false positives. See the linterUnusedVariables.lean diff at https://github.com/leanprover/lean4/pull/3082/files#diff-a33d4a408eca48eae24f881090327a42577735140184b3485ec87f08af36a8cc

One part of the algorithm is supposed to address this: If there is a global constant info over the same span as a variable declaration, then it is not linted. Apparently this only works for (mutual) def though, not let rec and where declarations, which seems like a bug in the info tree generation. (Go to definition on whereVariable.x does seem to work though.)

Jan 17 '24 10:01 digama0

Mathlib's lean-pr-testing-3186 branch needs to be made into a PR if you want to run !bench on Mathlib.

leanprover-community/mathlib4#9803, are you saying that I should run !bench in the comments of that PR or that running it here will work now that the PR exists?

If you want to run the mathlib benchmark, you have to say !bench there. And very likely you’ll also want to manually create a throw-away PR with the head set to whatever commit you want to compare it to (nightly-testing-2024-xx-yy or so). Yes, it’s very tedious.

Jan 17 '24 10:01 nomeata

Note that two tests in the Lean 4 test suite accidentally test false positives. See the linterUnusedVariables.lean diff at https://github.com/leanprover/lean4/pull/3082/files#diff-a33d4a408eca48eae24f881090327a42577735140184b3485ec87f08af36a8cc

One part of the algorithm is supposed to address this: If there is a global constant info over the same span as a variable declaration, then it is not linted. Apparently this only works for (mutual) def though, not let rec and where declarations, which seems like a bug in the info tree generation. (Go to definition on whereVariable.x does seem to work though.)

The adjusted implementation of References.lean in the linked PR fixes the issue, though.

Jan 17 '24 10:01 mhuisi

Turns out the reason for the issue is because it is being explicitly excluded. I pushed a fix, but given that this was deliberate behavior I guess @Kha and @mhuisi should work out whether it is better to lint on unused let rec and where definitions or not. My inclination is to not lint them, and handle them using the same method as unused global definitions (which currently is nothing at all, but in the future may be a global analysis, or possibly an end-of-file analysis for private declarations).

Jan 17 '24 11:01 digama0

Do you already have numbers on specific, big proofs? E.g. I believe a few theorems in https://github.com/cedar-policy/cedar-spec/blob/main/cedar-lean/Cedar/Thm/Validation/Typechecker/LUB.lean took a significant time in the linter, would be great to have a comparison. I'm mostly wondering whether this change reduces the issue to a non-issue for them or whether it will pop up again for proofs twice as big.

Jan 22 '24 16:01 Kha

I don't, but there are asymptotic improvements in this PR, the metavariable scanning was reduced from O(n^2) to O(n) and I believe that's the main bottleneck so I would expect it to have outsized impact on the big proofs that reportedly had issues with this linter.

I just pulled the cedar repo (217b74e) and the LUB file was noticeably slow to compile, but with set_option profiler true the only measurable linter case was (the mutual block containing) Cedar.Thm.lub_trans, which took 143ms on v4.5.0-rc1, so not too extreme. Mathlib presumably has bigger examples but the stats we have don't point to specific theorems. EDIT: on lean-pr-releases-3186 it takes 95ms, so about 33% faster, consistent with the average 23% speedup on mathlib with bigger wins on bigger theorems.

Jan 23 '24 21:01 digama0

I added a bunch of docs, I think this is ready for review. I still need to double check that all the builtin ignore fns are documented correctly and all do something - I think some of them are currently useless.

Feb 29 '24 16:02 digama0

I still need to double check that all the builtin ignore fns are documented correctly and all do something - I think some of them are currently useless.

Is that something that is affected by this PR, or should that wait for another PR?

Feb 29 '24 16:02 Kha

Well only insofar as the docs were just added and it would be good to make sure they are correct

Feb 29 '24 16:02 digama0

Tests need a copy-produced

Is this auto-corrupt kicking in? I can't make sense of it.

Mar 04 '24 18:03 digama0

copy-produced is described here: https://lean-lang.org/lean4/doc/dev/testing.html#fixing-tests

Mar 05 '24 09:03 Kha

lean4 lean4 copied to clipboard

perf: rewrite UnusedVariables lint

lean4
lean4 copied to clipboard