jj icon indicating copy to clipboard operation
jj copied to clipboard

FR: make tests faster to recompile

Open jyn514 opened this issue 5 months ago • 17 comments

Is your feature request related to a problem? Please describe. Recompiling jj's tests after a change is slow; there are 81 separate binaries that have to be linked and compiled. I have a fairly fast machine (Ryzen 7700x), but even here it takes about 50 seconds to recompile, longer than actually running the tests.

Describe the solution you'd like Use a shared integration test binary, and mod test_abandon_command etc for loading separate tests. That only has to link two binaries (one for lib/, one for cli/), and should greatly reduce compile times.

Describe alternatives you've considered

  • Don't make any changes
  • Switch from integration tests to unit tests. This might be possible but limits jj to tests that can be run in parallel, and is likely annoying to implement.

Additional context HEAD is 01825953aca5b9aab6d7e9db7777818482f99914; i've made local changes but none that should affect compile times.

jyn514 avatar Feb 02 '24 06:02 jyn514

Re your actual suggestions: I'm not an expert, so I don't have many comments. You can try these changes and see if it speeds things up. If it does, many people including myself will benefit :) Perhaps somebody who understands Rust compilation better will also chime in.

Some more general thoughts:

I've had compilation take even longer, it is not fun. Rust likes to eat CPU.

You can try the general suggestions, if you haven't already.

  • Make sure you're using nextest, though this might speed up test execution more than the compilation.

  • If you are on Linux, you should try using mold, especially if you think that the linker using more cores would speed things up.

If you're on Windows, malware-protecting services (e.g. Windows Defender) could slow things up a lot. On a Mac, I had problems with that, and I'm guessing Windows is similar.

ilyagr avatar Feb 02 '24 06:02 ilyagr

right, nexttest can't help with test compilation because it still has to run cargo build to generate the test harnesses. i'm on linux, i can try mold.

jyn514 avatar Feb 02 '24 06:02 jyn514

out of curiosity, how long do the tests take for you to recompile? is it about a minute as well?

jyn514 avatar Feb 02 '24 07:02 jyn514

First try (recompiled a bunch of dependencies too, but not all): 32.55 s

After deleting target/: 38.4s. timings.zip

Before I disabled the single-threaded malware detector, it was closer to 50s.

This is on a 12-core Apple M2 Mac. Macs use a multi-threaded linker similar to mold AFAIK.

ilyagr avatar Feb 02 '24 07:02 ilyagr

But re your original idea: it'd be really cool if it helps! :)

ilyagr avatar Feb 02 '24 07:02 ilyagr

with mold it seems to take ... almost twice as long??? github isn't letting me upload the timings for some reason but here they are rendered: https://jyn514.github.io/assets/cargo-timing-20240202T071243Z.html

jyn514 avatar Feb 02 '24 07:02 jyn514

Bizarre. No idea why that would happen, and I'm not an expert on mold either. For the timings, you could zip them to make GitHub happy (it filters by extensions), but I like your way of sharing the html better.

ilyagr avatar Feb 02 '24 07:02 ilyagr

oh lol i was using a version of mold from several years ago

with a new version it works faster than lld, but not that much faster, it's still about a second per test. https://jyn.dev/assets/cargo-timing-20240202T072514Z.html

i'll try combining them into a single integration test.

jyn514 avatar Feb 02 '24 07:02 jyn514

Just fyi, there's a previous attempt: #2139

Another workaround is to move cli tests to an empty crate. Since these tests just run the jj executable, it works. The problem is that we can't express a binary dependency, so there's no way to ensure jj is recompiled.

yuja avatar Feb 02 '24 07:02 yuja

In the long run, this might not be what we want, particularly if we break jj into separate crates for subcommands (see https://blog.waleedkhan.name/rust-incremental-test-times/). Generally, I do iterative work by running only one or two test modules, and only run all of them later (or let CI do it), and I think putting all the integration tests into the same crate pessimizes that workflow.

@arxanas hmm, have you benchmarked the difference? if i checkout #2139 and run touch lib/src/lib.rs ; c t --no-run with a fresh cache, it takes only 3.5 seconds to recompile all tests. the previous commit takes 11 seconds for all tests, and seven seconds for touch lib/src/lib.rs ; c t --no-run --test test_working_copy_sparse (because it has to rebuild dependencies as well; with those cached it takes 1.4 seconds, which is better than 3.5 but ... not by that much).

i can try this again with a modern branch of course, a lot of tests have been added since then, but i suspect the time per test will be roughly the same.

btw, re

The cargo build timings for the incremental build doesn’t help. It just shows that the test I’m building takes 100% of the spent time.

the flag you want is cargo rustc -- -Ztime-passes

jyn514 avatar Feb 02 '24 07:02 jyn514

with a new version it works faster than lld, but not that much faster,

If people keep having this experience, we could remove the mold recommendation from the contributing docs.

I had one really good experience where mold sped everything up by 2x (for ARM Linux running in QEMU on my Mac), and one experience with AMD where mold didn't speed things up at all.

ilyagr avatar Feb 02 '24 07:02 ilyagr

i can try this again with a modern branch of course, a lot of tests have been added since then, but i suspect the time per test will be roughly the same.

i've done this; it's now ~4 seconds for all tests

;  hyperfine 'touch lib/src/lib.rs ; cargo t --test runner --no-run'
  Time (mean ± σ):      4.055 s ±  0.123 s    [User: 3.591 s, System: 1.593 s]
  Range (min … max):    3.804 s …  4.159 s    10 runs

compared to 32-50 seconds for all tests and 2 seconds for a single test before

Benchmark 1: touch lib/src/lib.rs ; cargo t --no-run --test test_local_working_copy_concurrent
  Time (mean ± σ):      2.106 s ±  0.072 s    [User: 1.361 s, System: 0.653 s]
  Range (min … max):    2.020 s …  2.257 s    10 runs

jyn514 avatar Feb 02 '24 07:02 jyn514

with a new version it works faster than lld, but not that much faster,

If people keep having this experience, we could remove the mold recommendation from the contributing docs.

iirc, the default on Linux is GNU ld. clang lld is much faster, and mold is somewhat faster than lld.

yuja avatar Feb 02 '24 07:02 yuja

fwiw, autotests = false will help if we want to combine all tests without reorganizing directory structure. https://doc.rust-lang.org/cargo/reference/cargo-targets.html#target-auto-discovery

yuja avatar Feb 02 '24 08:02 yuja

yup, i'm using that :) and adding a test to make sure people don't forget the mod foo; declaration

jyn514 avatar Feb 02 '24 08:02 jyn514

iirc, the default on Linux is GNU ld. clang lld is much faster, and mold is somewhat faster than lld.

FWIW it's codebase dependent but in many cases for real Rust codebases, mold can be up to 10x faster, e.g. I've had it take 30 second link times (which are nearly unbearable) to less than 3 seconds. I suspect it'll always be somewhere in this range ("a little to an unbelievable amount") in terms of improved perf.

The reality is that we don't have many crates and dependencies right now, so while our iteration time is somewhat slower, it puts a bit less pressure on the linker since there's not as much work to do. Our compile times aren't too long, so only having two crates isn't that bad. But realistically as jj gets bigger, the compilation step will begin to take longer and longer, and we will probably need to start splitting things out more into crates due to that being the way to achieve parallel compilation; that is when mold will begin to shine even more.

thoughtpolice avatar Feb 02 '24 17:02 thoughtpolice

I responded to this effect at https://github.com/martinvonz/jj/pull/2914, but I don't think the test compilation time by itself is what we should be optimizing for, but rather the total test runtime for the full-test and incremental-test cases.

arxanas avatar Feb 03 '24 02:02 arxanas