feat: proper recursive specialization
This PR enables the specializer to also recursively specialize in some non trivial higher order situations.
The main motivation for this change is the upcoming changes to do notation by sgraf. In there he uses combinators such as
@[specialize, expose]
def List.newForIn {α β γ} (l : List α) (b : β) (kcons : α → (β → γ) → β → γ) (knil : β → γ) : γ :=
match l with
| [] => knil b
| a :: l => kcons a (l.newForIn · kcons knil) b
in programs such as
def testing :=
let x := 42;
List.newForIn (β := Nat) (γ := Id Nat)
[1,2,3]
x
(fun i kcontinue s =>
let x := s;
List.newForIn
[i:10].toList x
(fun j kcontinue s =>
let x := s;
let x := x + i + j;
kcontinue x)
kcontinue)
pure
inspecting this IR right before we get to the specializer in the current compiler we get:
[Compiler.eagerLambdaLifting] size: 22
def testing : Nat :=
fun _f.1 _y.2 : Nat :=
return _y.2;
let x := 42;
let _x.3 := 1;
fun _f.4 i kcontinue s : Nat :=
fun _f.5 j kcontinue s : Nat :=
let _x.6 := Nat.add s i;
let x := Nat.add _x.6 j;
let _x.7 := kcontinue x;
return _x.7;
let _x.8 := 10;
let _x.9 := Nat.sub _x.8 i;
let _x.10 := Nat.add _x.9 _x.3;
let _x.11 := 1;
let _x.12 := Nat.sub _x.10 _x.11;
let _x.13 := Nat.mul _x.3 _x.12;
let _x.14 := Nat.add i _x.13;
let _x.15 := @List.nil _;
let _x.16 := List.range'TR.go _x.3 _x.12 _x.14 _x.15;
let _x.17 := @List.newForIn _ _ _ _x.16 s _f.5 kcontinue;
return _x.17;
let _x.18 := 2;
let _x.19 := 3;
let _x.20 := @List.nil _;
let _x.21 := @List.cons _ _x.19 _x.20;
let _x.22 := @List.cons _ _x.18 _x.21;
let _x.23 := @List.cons _ _x.3 _x.22;
let _x.24 := @List.newForIn _ _ _ _x.23 x _f.4 _f.1;
return _x.24
Here the kcontinue higher order functions pose a special challenge because they delay the discovery of new specialization opportunities. Inspecting the IR after the current specializer (and a cleanup simp step) we get functions that look as follows:
[simp] size: 7
def List.newForIn._at_.testing.spec_0 i kcontinue l b : Nat :=
cases l : Nat
| List.nil =>
let _x.1 := kcontinue b;
return _x.1
| List.cons head.2 tail.3 =>
let _x.4 := Nat.add b i;
let x := Nat.add _x.4 head.2;
let _x.5 := List.newForIn._at_.testing.spec_0 i kcontinue tail.3 x;
return _x.5
[simp] size: 14
def List.newForIn._at_.List.newForIn._at_.testing.spec_1.spec_1 _x.1 l b : Nat :=
cases l : Nat
| List.nil =>
return b
| List.cons head.2 tail.3 =>
fun _f.4 x.5 : Nat :=
let _x.6 := List.newForIn._at_.List.newForIn._at_.testing.spec_1.spec_1 _x.1 tail.3 x.5;
return _x.6;
let _x.7 := 10;
let _x.8 := Nat.sub _x.7 head.2;
let _x.9 := Nat.add _x.8 _x.1;
let _x.10 := 1;
let _x.11 := Nat.sub _x.9 _x.10;
let _x.12 := Nat.mul _x.1 _x.11;
let _x.13 := Nat.add head.2 _x.12;
let _x.14 := @List.nil _;
let _x.15 := List.range'TR.go _x.1 _x.11 _x.13 _x.14;
let _x.16 := List.newForIn._at_.testing.spec_0 head.2 _f.4 _x.15 b;
return _x.16
Observe that the specializer decided to abstract over kcontinue instead of specializing further recursively. Thus this tight loop is now going through an indirect call.
This PR now changes the specializer somewhat fundamentally to handle situations like this. The most notable change is going to a fixpoint loop of:
- Specialize all current declarations in the worklist
- If a declaration
- succeeded in specializing run the simplifier on it and put it back onto the worklist
- if it didn't don't put it back onto the worklist anymore
- Put all newly generated specialisations on the worklist
- Recompute fixed parameters for the current SCC
- Repeat until the worklist is empty
Furthermore, declarations that were already specialized:
- only consider
fixedHOparameters for specialization, in order to avoid termination issues with repeated specialization and abstraction of type class parameters under binders - recursively specialized declarations only allow specialization if at least one of their fixedHO arguments is not a parameter itself. The reason for allowing this in first generation specialization is that we refrain from specializing inside the body of a declaration marked as
@[specialize]. Thus we need to specialize them even if their arguments don't actually contain anything of interest in order to ensure that type classes etc. are correctly cleaned up within their bodies.
Closes #10924
!radar
Benchmark results for fd5b87db61ee3c9a5337f813c1118e247e9563ec against 0e83422fb6337985c3354c263a9510cbcb4b1e05 are in! @hargoniX
Major changes (72)
- ✅
Init.Data.BitVec.Lemmas re-elab//instructions: -11.1G (-1.3%) - 🟥
Init.Data.BitVec.Lemmas//instructions: +11.6G (+7.8%) - 🟥
Init.Data.List.Basic re-elab//instructions: +9.5G (+5.8%) - 🟥
Init.Data.List.Sublist async//instructions: +889.8M (+7.4%) - 🟥
Init.Data.List.Sublist re-elab -j4 (watchdog rss)//instructions: +6.6G (+3.5%) - 🟥
Init.Data.List.Sublist re-elab -j4//instructions: +18.6G (+5.4%) - 🟥
Init.Prelude async//instructions: +858.2M (+7.0%) - 🟥
Std.Data.DHashMap.Internal.RawLemmas//instructions: +19.6G (+8.2%) - 🟥
Std.Data.Internal.List.Associative//instructions: +6.5G (+7.7%) - 🟥
big_beq//instructions: +640.5M (+5.6%) - 🟥
big_beq_rec//instructions: +1.5G (+7.4%) - 🟥
big_deceq//instructions: +136.3M (+2.6%) - 🟥
big_deceq_rec//instructions: +283.6M (+4.0%) - 🟥
big_do//instructions: +1.5G (+5.1%) - 🟥
big_match//instructions: +667.8M (+5.6%) - 🟥
big_match_partial//instructions: +1.1G (+6.5%) - 🟥
big_omega.lean MT//instructions: +1.9G (+7.2%) - 🟥
big_omega.lean//instructions: +1.8G (+7.2%) - 🟥
binarytrees.st//instructions: +6.8G (+11.6%) - 🟥
binarytrees//instructions: +6.8G (+11.5%) - 🟥
build//instructions: +41.0G (+0.3%) - 🟥
bv_decide_inequality.lean//instructions: +3.1G (+2.8%) - 🟥
bv_decide_large_aig.lean//instructions: +1.5G (+3.4%) - 🟥
bv_decide_mod//instructions: +7.4G (+3.3%) - 🟥
bv_decide_mul//instructions: +1.7G (+3.8%) - 🟥
bv_decide_realworld//instructions: +916.9M (+3.6%) - 🟥
bv_decide_rewriter.lean//instructions: +1.2G (+6.6%) - ✅
channel.lean//instructions: -718.5M (-1.8%) - 🟥
const_fold//instructions: +809.2M (+10.8%) - 🟥
deriv//instructions: +430.6M (+6.1%) - 🟥
grind_bitvec2.lean//instructions: +15.3G (+7.9%) - 🟥
grind_list2.lean//instructions: +4.7G (+7.6%) - 🟥
grind_ring_5.lean//instructions: +610.2M (+6.7%) - 🟥
hashmap.lean//instructions: +282.4M (+7.8%) - 🟥
identifier auto-completion//instructions: +4.7G (+6.1%) - 🟥
ilean roundtrip//instructions: +1.4G (+5.5%) - 🟥
iterators (elab)//instructions: +216.8M (+6.0%) - 🟥
iterators (interpreted)//instructions: +1.1G (+3.3%) - 🟥
lake build clean//instructions: +42.9G (+2.0%) - 🟥
lake build no-op//instructions: +223.8M (+3.7%) - 🟥
lake config elab//instructions: +89.1M (+2.8%) - ✅
lake config import//instructions: -27.4M (-1.7%) - ✅
lake config tree//instructions: -25.3M (-1.5%) - ✅
lake env//instructions: -29.9M (-1.8%) - 🟥
lake startup//instructions: +4.1M (+1.6%) - 🟥
language server startup with ileans//instructions: +1.2G (+5.5%) - 🟥
language server startup//instructions: +10.3M (+2.2%) - 🟥
liasolver//instructions: +261.6M (+6.6%) - 🟥
mut_rec_wf//instructions: +2.2G (+7.1%) - 🟥
nat_repr//instructions: +131.0M (+3.9%) - 🟥
omega_stress.lean async//instructions: +322.7M (+6.3%) - 🟥
parser//instructions: +3.2G (+6.4%) - 🟥
phashmap.lean//instructions: +544.3M (+5.6%) - 🟥
qsort//instructions: +1.1G (+7.0%) - 🟥
rbmap//instructions: +1.2G (+8.6%) - 🟥
rbmap_1//instructions: +1.0G (+7.3%) - 🟥
rbmap_10//instructions: +1.2G (+8.5%) - 🟥
rbmap_fbip//instructions: +666.6M (+9.7%) - 🟥
rbmap_library//instructions: +615.2M (+4.7%) - 🟥
reduceMatch//instructions: +1.1G (+6.2%) - 🟥
riscv-ast.lean//instructions: +10.0G (+7.7%) - 🟥
simp_arith1//instructions: +115.9M (+3.6%) - 🟥
simp_bubblesort_256//instructions: +1.1G (+7.5%) - 🟥
simp_congr//instructions: +1.4G (+7.4%) - 🟥
simp_local//instructions: +4.8G (+8.5%) - 🟥
simp_subexpr//instructions: +1.1G (+7.6%) - 🟥
tests/bench/ interpreted//instructions: +7.8G (+3.6%) - ✅
tests/compiler//sum binary sizes: -20.8M (-1.1%) - 🟥
treemap.lean//instructions: +527.5M (+2.4%) - 🟥
unionfind//instructions: +1.7G (+7.2%) - 🟥
workspaceSymbols with new ranges//instructions: +53.5M (+7.2%) - 🟥
workspaceSymbols//instructions: +1.4G (+5.5%)
Minor changes (3)
- 🟥
iterators (compiled)//instructions: +4.0M (+0.8%) - 🟥
sigma iterator//instructions: +3.0G (+7.5%) - ✅
size/libleanshared.so//bytes: -514kiB (-0.3%)
!bench
Benchmark results for 4a93cecb755ca725f839c2637c7cf6bd9cdd65bd against edcef51434804e6432c02f42ef364cb13f9149b2 are in! @Garmelon @hargoniX
Major changes (2)
- 🟥
build//instructions: +41.7G (+0.3%) - 🟥
channel.lean//instructions: +1.8G (+4.7%)
Minor changes (2)
- 🟥
iterators (elab)//instructions: +37.7M (+1.0%) - ✅
size/libleanshared.so//bytes: -486kiB (-0.2%)
!bench
Benchmark results for 28d874c30d761690997b6443a6f04ef15cc6447b against edcef51434804e6432c02f42ef364cb13f9149b2 are in! @hargoniX
Major changes (1)
- 🟥
build//instructions: +42.7G (+0.3%)
Minor changes (3)
- 🟥
Init.Data.BitVec.Lemmas re-elab//instructions: +4.3G (+0.5%) - 🟥
channel.lean//instructions: +268.8M (+0.7%) - 🟥
iterators (elab)//instructions: +37.7M (+1.0%)
Mathlib CI status (docs):
- ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the
nightly-with-mathlibbranch. Trygit rebase edcef51434804e6432c02f42ef364cb13f9149b2 --onto 5bd331e85d9d110a29fb3367dbb21854010ffcbd. You can force Mathlib CI using theforce-mathlib-cilabel. (2025-12-03 13:32:42) - ✅ Mathlib branch lean-pr-testing-11479 has successfully built against this PR. (2025-12-05 11:42:10) View Log
- ✅ Mathlib branch lean-pr-testing-11479 has successfully built against this PR. (2025-12-08 12:49:20) View Log
- ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the
nightly-with-mathlibbranch. Trygit rebase c1bc886d98e8764ba93761cdf4698a596e1e7e32 --onto 0708024c4649f11da1276d8b6d4345972a9dc57c. You can force Mathlib CI using theforce-mathlib-cilabel. (2025-12-17 11:08:15)
Reference manual CI status:
- ❗ Reference manual CI will not be attempted unless your PR branches off the
nightly-with-manualbranch. Trygit rebase edcef51434804e6432c02f42ef364cb13f9149b2 --onto d3dda9f6d4428a906c096067ecb75e432afc4615. You can force reference manual CI using theforce-manual-cilabel. (2025-12-03 13:32:43) - ✅ Reference manual branch lean-pr-testing-11479 has successfully built against this PR. (2025-12-05 10:48:51) View Log
- 🟡 Reference manual branch lean-pr-testing-11479 build against this PR didn't complete normally. (2025-12-05 10:49:55) View Log
- ✅ Reference manual branch lean-pr-testing-11479 has successfully built against this PR. (2025-12-08 11:52:29) View Log
- 🟡 Reference manual branch lean-pr-testing-11479 build against this PR didn't complete normally. (2025-12-08 11:53:32) View Log
- ❗ Reference manual CI will not be attempted unless your PR branches off the
nightly-with-manualbranch. Trygit rebase c1bc886d98e8764ba93761cdf4698a596e1e7e32 --onto 0708024c4649f11da1276d8b6d4345972a9dc57c. You can force reference manual CI using theforce-manual-cilabel. (2025-12-17 11:08:17)
!bench
Benchmark results for 91927dbce96dd975237d307d860683a68514a745 against edcef51434804e6432c02f42ef364cb13f9149b2 are in! @hargoniX
Major changes (2)
- 🟥
build//instructions: +50.0G (+0.4%) - 🟥
channel.lean//instructions: +835.2M (+2.2%)
Minor changes (2)
- 🟥
iterators (elab)//instructions: +37.7M (+1.0%) - 🟥
size/all/.olean//bytes: +963kiB (+0.3%)
!bench
Benchmark results for 790dd914400edcb6ae04fb035137afe7be060b7b against c1bc886d98e8764ba93761cdf4698a596e1e7e32 are in! @hargoniX
Major changes (1)
- 🟥
build//instructions: +54.0G (+0.4%)
Minor changes (2)
- 🟥
iterators (elab)//instructions: +30.7M (+0.8%) - 🟥
size/all/.olean//bytes: +986kiB (+0.3%)