spago icon indicating copy to clipboard operation
spago copied to clipboard

Parsing spago.lock and globbing the workspace is slow

Open Blugatroff opened this issue 1 year ago • 5 comments

spago build takes surprisingly long to read it's configuration files.

When using my somewhat older laptop i noticed, that an invocation of spago build on my smallish AdventOfCode project took 3.5 seconds, even though everything had already been compiled. I tried investigating what took so long by inserting logDebug and using the --verbose flag.

I noticed that running purs isn't actually the single most expensive step (took 800ms), instead globbing the workspace for every spago.yaml and parsing spago.lock takes quite long. I usually expect a package manager to finish very quickly when there is no actual work to be done, even on slightly older hardware. The old spago just takes 1.5s on the same project and device, so is this just the cost of writing spago in a language that compiles to javascript, or can we reduce this startup time? Or perhaps some of these steps can be skipped in certain cases?

How I measured: time node ~/clones/spago/bin/bundle.js build --verbose shows:

[      25ms] CWD: /home/jasper/projects/pure-advent
[      50ms] Global cache: "/home/jasper/.cache/spago-nodejs"
[      52ms] Local cache: "/home/jasper/projects/pure-advent/.spago"
[     112ms] DB: Connecting to database at /home/jasper/.cache/spago-nodejs/spago.v1.sqlite
[     122ms] Reading Spago workspace configuration...
[     122ms] Reading config from spago.yaml
[     164ms] Before Glob.match **/spago.yaml                    I Added this logDebug
[     698ms] After Glob.match **/spago.yaml                       Took about 550ms
[     701ms] Selecting package aoc from ./
[     702ms] Before FS.readYamlFile                             I Added this logDebug
[    1540ms] After FS.readYamlFile                               Took about 800ms
[    1541ms] Lockfile is up to date, using it
[    1542ms] Found lockfile, using the package set from there
[    1548ms]
✅ Selecting package to build: aoc

[    1549ms] Package path: ./
[    1550ms] Requested to install these packages: []
[    1556ms] Downloading dependencies...
[    1619ms] Building...
[    1655ms] Running command:
purs compile --codegen corefn,docs,js,sourcemaps --json-errors .spago/p/aff-promise/3aa74e68e . . . . . . . . .
           Src   Lib   All
Warnings     0     0     0
Errors       0     0     0
[    2464ms]
✅ Build succeeded.

[    2464ms] Looking for unused and undeclared transitive dependencies...
[    2465ms] Running command:
purs graph .spago/p/aff-promise/3aa74e68e3e4c3e . . . . . . . . . . .
[    2659ms] Called `purs graph`, decoding..

real	0m3,561s
user	0m7,615s
sys	0m1,242s

Blugatroff avatar Jan 26 '24 14:01 Blugatroff

Thanks for the detailed report @Blugatroff! We had a similar report in #1140, but that's unfortunately very light on details which made it hard to investigate.

Indeed 3.5s is a long time and I'd like it to be shorter, and think that it's possible to shorten it - according to your logs the globbing is 500ms, and reading the lockfile is 800. The first one is not too surprising but the latter is! Do you have this repo hosted somewhere so I could take a look?

f-f avatar Jan 26 '24 15:01 f-f

I believe a recursive walk using just FS.readDir with early exclusion of gitignore locations should do good enough work.

wclr avatar Jan 26 '24 16:01 wclr

Do you have this repo hosted somewhere so I could take a look?

Yes @f-f , https://github.com/Blugatroff/pure-advent/

Could the globbing specifically also take long because there are a lots of deeply nested directories in my project directory which get searched unnecessarily? I'm talking about stuff like .psci_modules, generated-docs, output, output-es, .spago

Blugatroff avatar Jan 26 '24 17:01 Blugatroff

Could the globbing specifically also take long because there are a lots of deeply nested directories in my project directory which get searched unnecessarily?

That was the general conclusion I reached in #1140, but there wasn't a reproducible repo to confirm that.

JordanMartinez avatar Jan 26 '24 20:01 JordanMartinez

Maybe we can speed it up with something like Cargo.toml's members. (A benchmark should come before a solution, of course).

bakhtiyarneyman avatar Feb 16 '24 02:02 bakhtiyarneyman

@Blugatroff I had a look at your pure-advent repo, and I can't repro the slow globbing - it takes about 30ms to read the globs, 250ms to read the lockfile, and 300ms to run a no-op purs build.

However, in an effort to dogfood ourselves a repo with a large amount of files, in #1197 I am adding a large number of text fixtures - this makes it easier to write tests, but it also adds a good number of directories to traverse for spago. On that branch, I can see some globbing slowdown: it takes 400ms to glob the configs, only 150ms to read the lockfile, and about 500ms to run a no-op purs build.

400ms should be large enough to serve as a benchmark for further patches. The slow bit of code is here: https://github.com/purescript/spago/blob/455d94a850dcc21e7c6e237486ea170dff49fd4f/src/Spago/Config.purs#L177-L187

There are two possible sources for the slowdown here:

  1. the globbing in line 178, where fast-glob is not being fast enough
  2. the check for gitignored files, where in line 182 we are calling git for every single config file that is matched by the glob

And we have a good trail we could follow:

  1. there is an issue about this in fast-glob, with mentions about using fs.walk and fs.readdir
  2. that ticket mentions another one in globby, which provides a code snippet that has a 8000% improvement over globby perf. This would solve our gitignore issue too.
  3. expanding on that, there's a whole fast implementation of globby in this PR (linked from the fast-glob ticket). I'm not sure we'll need all of that, so the snippet in the previous point might suffice

f-f avatar Mar 09 '24 09:03 f-f

I have investigated this a bit further, i tried the code snippet you linked and it is indeed much faster.

While doing this, i also figured out which of my directories is responsible for most of the slowdown. It's my .direnv directory generated by the nix-direnv tool, it contains a bunch of symlinks to packages in my nix-store all of which get traversed by fast-glob. Globbing only .direnv for **/spago.yaml took (on my slow laptop which triggered this issue) a glorious 2.07 seconds on first run and 575ms on subsequent runs (because of disk caching i would guess). Globbing for **/spago.yaml using the snippet only takes ~30 ms

So ignoring gitignored directories before/while globbing instead of filtering afterwards is a huge improvement.

Blugatroff avatar Apr 02 '24 00:04 Blugatroff

I have merged the PR linked above, and parsing times have improved for Spago itself as well - it now takes 150ms to glob the workspace (vs 400ms before the patch).

I have released the patch in a new version (0.93.29), could you try this out @wclr?

We could still squeeze out some more performance out of this - the ideal would be to just drop fast-glob, and instead have our own implementation that uses fs.walk and fs.readdir, and parses all the gitignore files. I'll leave this issue open to track this effort

f-f avatar Apr 02 '24 11:04 f-f

I ran the latest version of spago against my project and got this: image

So it (reading config step) took more than a minute.

Note, I did this test with the build from this repo, the latest published version on npm (0.93.29) differs and hangs longer on this task (I didn't wait for it).

Also, I had a glance at the latest PR with the changes, I didn't dig into it, but I just put a debug output on reading of ".gitignore" file and I got this while running spago install: image

It is reading all the gitignore files... but ".spago" is included in the root .gitignore, so I doubt that those files should be even seen or touched, this makes me think that the latest used method of waking a project dir may be not very efficient too.

wclr avatar Apr 29 '24 12:04 wclr

I think the first step here is to figure put - likely with a small test - if the issue here occurs only on windows, since performance on mac and linux has improved significantly and spago is not looking at those files there.

f-f avatar Apr 29 '24 12:04 f-f