lean4
lean4 copied to clipboard
feat: rm partial / bounds checks in Array.qsort
With omega
in core, it is now more feasible to perform the necessary proofs to eliminate the partial
from Array.qsort
, as well as the many uses of Array.get!
and Array.swap!
. (This is also a perfect example of the kind of code that would benefit from the proposed Vector
API.)
I wrote this code over a year ago, but the stance regarding proofs in lean core has shifted so I think this is now more likely to be accepted. This is in any case blocking some proofs in Std which are built on top of uses of Array.qsort.
!bench
Here are the benchmark results for commit 228efff59167790cf0c7638390d28f341522ce88. The entire run failed. Found no significant differences.
Mathlib CI status (docs):
- ❗ Std/Mathlib CI will not be attempted unless your PR branches off the
nightly-with-mathlib
branch. Trygit rebase 88ee503f02e047c8c3fd1eb3ac8310b123380038 --onto d3e004932c1f5ac3946850692940512d381c7634
. (2024-04-17 12:10:44) - ❗ Std/Mathlib CI will not be attempted unless your PR branches off the
nightly-with-mathlib
branch. Trygit rebase 036b5381f0f1a193f1734ca9ffb608f09c7b7f0b --onto d3e004932c1f5ac3946850692940512d381c7634
. (2024-04-17 20:12:37) - ❗ Std/Mathlib CI will not be attempted unless your PR branches off the
nightly-with-mathlib
branch. Trygit rebase 036b5381f0f1a193f1734ca9ffb608f09c7b7f0b --onto 62cdb51ed5b9d8487877d5a4adbcd4659d81fc6a
. (2024-04-24 06:24:17)
!bench
Here are the benchmark results for commit 28c305aa877266384d62bdfd6fc4e37ce39d64e8. There were significant changes against commit 036b5381f0f1a193f1734ca9ffb608f09c7b7f0b:
Benchmark Metric Change
============================================
- stdlib instructions 2.0% (371.8 σ)
- stdlib task-clock 1.3% (11.5 σ)
- stdlib type checking 9.0% (11.0 σ)
- stdlib wall-clock 8.0% (86.5 σ)
Do you know why this gets the stdlib
benchmark slower by a clearly significant margin? This seems rather counter intuitive to me. Are the proofs in Array.qsort
taking so long?
The proof is actually suspiciously slow (the definition of qpartition
is about 2 seconds). I think it might actually be the compiler too; this code contains three if statements at the beginning and an @[inline]
and this seems to hit a relatively bad case for the compiler (there were other variations of this code that behaved even worse). The generated code itself seems to be fine though.
This code is slightly suboptimal, in that n
is passed as an argument to qpartition
even though it is redundant, and likewise lo
and hi
are passed unnecessarily to qpartition.loop
. My original version of this code used some tricks to eliminate this, at the cost of making the proof (and code) significantly more obtuse; I decided that this was not a good tradeoff for maintenance reasons and submitted the simpler version here. Ideally, this would be solved by marking these parameters as "ghost" / not computationally relevant.
!bench
(The version that was previously benched had a sharing bug. Unsure if that is related to the stdlib slowdown.)
Here are the benchmark results for commit 824a7fecba3db57fc8c10932369511a3c8506709. There were significant changes against commit 036b5381f0f1a193f1734ca9ffb608f09c7b7f0b:
Benchmark Metric Change
===========================================
- stdlib type checking 4.6% (17.2 σ)
- stdlib wall-clock 3.8% (23.5 σ)
Marking draft pending further benchmarking (build time and run time)