lean4
lean4 copied to clipboard
perf: rewrite UnusedVariables lint
This is a rewrite of the UnusedVariables lint to inline and simplify many of the dependent functions to try to improve the performance of this lint, which quite often shows up in perf reports.
- The mvar assignment scanning is one of the most expensive parts of the process, so we do two things to improve this:
- Lazily perform the scan only if we need it
- Use an object-pointer hashmap to ensure that we don't have quadratic behavior when there are many mvar assignments with slight differences.
- The dependency on
Lean.Serveris removed, meaning we don't need to do the LSP conversion stuff anymore. The main logic of reference finding is inlined. - We take
fvarAliasesinto account, and union together fvars which are aliases of a base fvar. (It would be great if we hadUnionFindhere.)
More docs will be added once we confirm an actual perf improvement.
!bench
Mathlib CI status (docs):
- ❌ Mathlib branch lean-pr-testing-3186 built against this PR, but testing failed. (2024-01-15 18:58:37) View Log
- ❌ Mathlib branch lean-pr-testing-3186 built against this PR, but testing failed. (2024-01-16 18:51:01) View Log
- ❌ Mathlib branch lean-pr-testing-3186 built against this PR, but testing failed. (2024-01-17 09:51:55) View Log
- ❌ Mathlib branch lean-pr-testing-3186 built against this PR, but testing failed. (2024-01-17 10:09:45) View Log
- ❌ Mathlib branch lean-pr-testing-3186 built against this PR, but testing failed. (2024-01-17 13:05:45) View Log
- ❗ Std/Mathlib CI will not be attempted unless your PR branches off the
nightly-with-mathlibbranch. Trygit rebase 7d5b6cf097c435a5e45e4549115945f4f577fc56 --onto c5fd88f5e118738425dd55b7592de435b7db8483. (2024-02-27 11:42:02) - ❗ Std/Mathlib CI will not be attempted unless your PR branches off the
nightly-with-mathlibbranch. Trygit rebase 846300038f0e692fe08ea83d410f96c233624489 --onto 2d18eff544093882d45b7e1e97a0876247324f2b. (2024-03-19 10:06:58)
The new warnings produced in Std and Mathlib are false positives, e.g.
./.lake/packages/std/././Std/Data/Ord.lean:87:25: warning: unused variable `ord` [linter.unusedVariables]
is not unused.
I'm going to try to extract some of those as additional test cases, since the lean4 test suite passed.
The new warnings produced in Std and Mathlib are false positives, e.g.
./.lake/packages/std/././Std/Data/Ord.lean:87:25: warning: unused variable `ord` [linter.unusedVariables]is not unused.
@[inline] def compareOn [ord : Ord β] (f : α → β) (x y : α) : Ordering :=
compare (f x) (f y)
Are you sure? It looks pretty unused to me. Is there an IgnoreFunction which makes this an exception?
It looks like a bug in the original code relating to autoImplicits, since
@[inline] def compareOn {α β} [ord : Ord β] (f : α → β) (x y : α) : Ordering :=
compare (f x) (f y)
does trigger the unused variables warning on ord. (I think there is a reasonable argument that this should be exempted because it is a variable in the function signature, but this is independent of the baseline check.)
I reviewed all of the new linter warnings, and all of them appear to be correct (to spec; I know some people have doubts about this algorithm but it's doing what it is designed to do). Do I need to make a fix PR before it is possible to !bench this?
Mathlib's lean-pr-testing-3186 branch needs to be made into a PR if you want to run !bench on Mathlib.
I'm not sure why the bot didn't respond to your !bench request above. Let me try it again.
!bench
Here are the benchmark results for commit c7f6b4275d744f880306b812c2fa902f1c2896f0. There were significant changes against commit 12dc171c4862df0177de4adec56a8f739359719b:
Benchmark Metric Change
===========================================
- stdlib type checking 2.3% (11.1 σ)
I'm not sure why the bot didn't respond to your
!benchrequest above. Let me try it again.
I'm pretty sure it's because I don't have permission to run the bot (the list of approved users is separate from org members, and I think only you and Sebastian and Leo can run it, and possibly other FRO members as well).
Note that two tests in the Lean 4 test suite accidentally test false positives. See the linterUnusedVariables.lean diff at https://github.com/leanprover/lean4/pull/3082/files#diff-a33d4a408eca48eae24f881090327a42577735140184b3485ec87f08af36a8cc
Mathlib's
lean-pr-testing-3186branch needs to be made into a PR if you want to run!benchon Mathlib.
https://github.com/leanprover-community/mathlib4/pull/9803, are you saying that I should run !bench in the comments of that PR or that running it here will work now that the PR exists?
Note that two tests in the Lean 4 test suite accidentally test false positives. See the
linterUnusedVariables.leandiff at https://github.com/leanprover/lean4/pull/3082/files#diff-a33d4a408eca48eae24f881090327a42577735140184b3485ec87f08af36a8cc
One part of the algorithm is supposed to address this: If there is a global constant info over the same span as a variable declaration, then it is not linted. Apparently this only works for (mutual) def though, not let rec and where declarations, which seems like a bug in the info tree generation. (Go to definition on whereVariable.x does seem to work though.)
Mathlib's
lean-pr-testing-3186branch needs to be made into a PR if you want to run!benchon Mathlib.leanprover-community/mathlib4#9803, are you saying that I should run
!benchin the comments of that PR or that running it here will work now that the PR exists?
If you want to run the mathlib benchmark, you have to say !bench there. And very likely you’ll also want to manually create a throw-away PR with the head set to whatever commit you want to compare it to (nightly-testing-2024-xx-yy or so). Yes, it’s very tedious.
Note that two tests in the Lean 4 test suite accidentally test false positives. See the
linterUnusedVariables.leandiff at https://github.com/leanprover/lean4/pull/3082/files#diff-a33d4a408eca48eae24f881090327a42577735140184b3485ec87f08af36a8ccOne part of the algorithm is supposed to address this: If there is a global constant info over the same span as a variable declaration, then it is not linted. Apparently this only works for (mutual)
defthough, notlet recandwheredeclarations, which seems like a bug in the info tree generation. (Go to definition onwhereVariable.xdoes seem to work though.)
The adjusted implementation of References.lean in the linked PR fixes the issue, though.
Turns out the reason for the issue is because it is being explicitly excluded. I pushed a fix, but given that this was deliberate behavior I guess @Kha and @mhuisi should work out whether it is better to lint on unused let rec and where definitions or not. My inclination is to not lint them, and handle them using the same method as unused global definitions (which currently is nothing at all, but in the future may be a global analysis, or possibly an end-of-file analysis for private declarations).
Do you already have numbers on specific, big proofs? E.g. I believe a few theorems in https://github.com/cedar-policy/cedar-spec/blob/main/cedar-lean/Cedar/Thm/Validation/Typechecker/LUB.lean took a significant time in the linter, would be great to have a comparison. I'm mostly wondering whether this change reduces the issue to a non-issue for them or whether it will pop up again for proofs twice as big.
I don't, but there are asymptotic improvements in this PR, the metavariable scanning was reduced from O(n^2) to O(n) and I believe that's the main bottleneck so I would expect it to have outsized impact on the big proofs that reportedly had issues with this linter.
I just pulled the cedar repo (217b74e) and the LUB file was noticeably slow to compile, but with set_option profiler true the only measurable linter case was (the mutual block containing) Cedar.Thm.lub_trans, which took 143ms on v4.5.0-rc1, so not too extreme. Mathlib presumably has bigger examples but the stats we have don't point to specific theorems. EDIT: on lean-pr-releases-3186 it takes 95ms, so about 33% faster, consistent with the average 23% speedup on mathlib with bigger wins on bigger theorems.
I added a bunch of docs, I think this is ready for review. I still need to double check that all the builtin ignore fns are documented correctly and all do something - I think some of them are currently useless.
I still need to double check that all the builtin ignore fns are documented correctly and all do something - I think some of them are currently useless.
Is that something that is affected by this PR, or should that wait for another PR?
Well only insofar as the docs were just added and it would be good to make sure they are correct
Tests need a copy-produced
Is this auto-corrupt kicking in? I can't make sense of it.
copy-produced is described here: https://lean-lang.org/lean4/doc/dev/testing.html#fixing-tests