lake icon indicating copy to clipboard operation
lake copied to clipboard

fix: do not compute transitive imports for every module

Open gebner opened this issue 1 year ago • 10 comments

Addresses #130. Reduces the runtime of the benchmark in #130 from 8.4s to 1.7s on my laptop.

gebner avatar Oct 14 '22 22:10 gebner

I am shocked this matters so much as lake is still computing transitive imports for quite a few modules.

I suspect that if you turned on precompiled modules or used globs to build the library (i.e., to catch every module in subdirectory into a single build with a root module that imports everything) this would be no different (as it would then need to compute transitive imports for everything anyway). I would be interested to see if I am correct about this.

Regardless, as to the code itself, if we are no longer caching transitive imports, it makes sense to inline the behavior into most of its use sites (as otherwise it often just computing a list to immediately iterate over it and map the imports to a new list).

tydeu avatar Oct 15 '22 00:10 tydeu

if you turned on precompiled modules

Yeah, turning on precompiled modules will still cause the quadratic behavior at the moment. But we will only use them in mathlib if we can selectively enable them for tactics only. (And there are comparatively few files with tactics, reducing the impact.)

used globs to build the library

Okay, say you structure a library so that a Foo/Bar/ directory has a corresponding Foo/Bar.lean import file with all files in Foo/Bar/. And assume that every directory has ten children, up to a depth N. Then in the the first layer, the 10 import files each contain 10^(N-1) direct imports. On the second layer, we have 10*10 import files, each with 10^(N-2) direct imports, etc. So, in total you have around N*10^N imports from these import files.

There are probably a few off-by-ones in the calculation, but the number of imports would not be quadratic, it would be roughly depth * number of files. (And the depth is practically bounded, I don't think anybody wants to write module names with more than a dozen components...)

gebner avatar Oct 15 '22 00:10 gebner

Yeah, turning on precompiled modules will still cause the quadratic behavior at the moment. But we will only use them in mathlib if we can selectively enable them for tactics only. (And there are comparatively few files with tactics, reducing the impact.)

Oh, previous discussions had lead me to believe that we had decided on just turning precompiled modules on for the whole library (since some complex tactics import some mathlib files high up in the import tree so most of mathlib would likely end up being precompiled anyway).

Second, since the new implementation no long caches transitive imports, isn't the precompiled modules algorithmic even slower (since it will recompute the transitive imports for a module each time it is visited)? If I am correct, we should probably to store the transImports in a facet like imports to leverage the build cache to avoid such overhead.

Okay, say you structure a library so that a Foo/Bar/ directory has a corresponding Foo/Bar.lean import file with all files in Foo/Bar/..

Sorry, no that was not what I was saying -- that is the current non-glob approach (with a single root file that imports its dependencies). The glob approach (using the globs field of the lean library config) does a recursive walk of the file tree and then builds every module it found. Since building a module computes its transitive dependencies, this leads to computation fo transitive imports for every module (just like when the precompiled setting is on).

tydeu avatar Oct 15 '22 03:10 tydeu

since it will recompute the transitive imports for a module each time it is visited

As far as I can tell, all of the recBuild* functions where transImports is called are only called once per module. Is my mental model off?

Is anything missing from this PR?

gebner avatar Oct 18 '22 17:10 gebner

@gebner

Is anything missing from this PR?

I did a quick review and if the two comments above are addressed, everything should be good for a merge.

tydeu avatar Oct 19 '22 18:10 tydeu

Okay, I've tried to understand the logic behind the precompiled modules code so that I can fix it properly. One thing that confuses me is that we treat direct and indirect imports differently (which has the effect that we call transImports 40x per file in my benchmark, once for every import..).

If I understand it correctly, then we only enable precompiled modules if one of the direct imports is precompiled. But this is clearly not sufficient, even if we only allow enabling precompiled modules per package (and not per-module as we'd like for mathlib). For example, let's say we enable precompiled modules for the aesop dependency but not for mathlib itself, and Mathlib.Tactic.Tauto imports Aesop.

import Mathlib.Tactic.Tauto
-- AFAICT, this file won't use the precompiled module for `Aesop`

Did I miss something obvious? (I didn't actually try it out.)

gebner avatar Oct 21 '22 02:10 gebner

@gebner

If I understand it correctly, then we only enable precompiled modules if one of the direct imports is precompiled ... Did I miss something obvious?

No, I think you are right, and that is a bug. 😞

tydeu avatar Oct 21 '22 02:10 tydeu

The fix, though, I think is simple. We just need to pass the module's transitive imports rather than the direct imports here: https://github.com/leanprover/lake/blob/84dd6d75c722f24955cefd422114c1e7f4da2d40/Lake/Build/Module.lean#L94

tydeu avatar Oct 21 '22 02:10 tydeu

Uhm, that one-line change will just make it compute the transitive imports of every transitive import.

My main takeaway here is that we need to go through the precompiled modules codepath for every module, we can't just ignore it like we do now. In particular, we need to do something more clever than go through all transitive imports for every module (because that's quadratic).

gebner avatar Oct 21 '22 02:10 gebner

@gebner

In particular, we need to do something more clever than go through all transitive imports for every module (because that's quadratic).

While I can see potentially ways to decrease the workload we do per transitive import (e.g.,, return a Job containing the transitive precompiled dynlibs set as part of the module build). I can't think of any way to remove the need to do workload proportional to that for each module (e.g., we would still need to combine the dynlib lists of each import which is still at least worst case linear work per module and thus quadratic overall).

tydeu avatar Oct 21 '22 03:10 tydeu

Fwiw, in the Nix prototype I forward-propagate the set of dynlibs to be loaded unconditionally. While in theory this can be quadratic in the number of modules, in practice it is clear we should not let this set grow too large anyway since actually loading the libraries is just as expensive. See also the precompilePackage Nix option.

Kha avatar Nov 08 '22 10:11 Kha

Note also the optimization of resetting the dynlib closure at actually precompiled modules, which is correct on Unix platforms with proper dynlib dependency tracking (Linux/macOS at the very least): https://github.com/leanprover/lean4/blob/9f2182fdd9cde8ab5464ccb03528c794bafef761/nix/buildLeanPackage.nix#L181

Kha avatar Nov 08 '22 10:11 Kha

@gebner I just pushed a commit that incorporated some of your changes. I am curious what benefit, if any, the update has on performance. Could you run your benchmark against the new update?

tydeu avatar Nov 29 '22 14:11 tydeu

I am curious what benefit, if any, the update has on performance. Could you run your benchmark against the new update?

The benchmark is literally running lake print-paths on the inundation repo:

$ time lake print-paths Inundation
{"srcPath":["./."],"oleanPath":["./build/lib"],"loadDynlibPaths":[]}
7.39user 1.21system 0:07.09elapsed 121%CPU (0avgtext+0avgdata 391848maxresident)k
128inputs+25616outputs (1major+25773minor)pagefaults 0swaps
$ time ~/lake/build/bin/lake print-paths Inundation
{"srcPath":["./."],"oleanPath":["./build/lib"],"loadDynlibPaths":[]}
1.86user 2.18system 0:02.14elapsed 188%CPU (0avgtext+0avgdata 351428maxresident)k
26920inputs+25616outputs (10major+23928minor)pagefaults 0swaps

So roughly a 4x improvement!

gebner avatar Dec 02 '22 18:12 gebner

@gebner In that case, I will consider the core issue of this PR reasonably addressed and close it now. Thanks for all your help!

tydeu avatar Dec 02 '22 19:12 tydeu