cargo icon indicating copy to clipboard operation
cargo copied to clipboard

Allow restricting the number of parallel linker invocations

Open luser opened this issue 4 years ago • 15 comments

In CI at my work, we ran into a situation where rustc would get OOM-killed while linking example binaries:

error: linking with `cc` failed: exit code: 1
  |
  = note: "cc" <…>
  = note: collect2: fatal error: ld terminated with signal 9 [Killed]
          compilation terminated.

We were able to mitigate this by using a builder with more available memory, but it's unfortunate. We could dial down the parallelism of the whole build by explicitly passing -jN, but that would make the non-linking parts of the build slower by leaving CPU cores idle.

It would be ideal if we could explicitly ask cargo to lower the number of parallel linker invocations it will spawn. Compile steps are generally CPU-intensive, but linking is usually much more memory-intensive. In the extreme case, for large projects like Firefox and Chromium where the vast majority of code gets linked into a single binary, that link step far outweighs any other part of the build in terms of memory usage.

In terms of prior art, ninja has a concept of "pools" that allow expressing this sort of restriction in a more generic way:

Pools allow you to allocate one or more rules or edges a finite number of concurrent jobs which is more tightly restricted than the default parallelism. This can be useful, for example, to restrict a particular expensive rule (like link steps for huge executables), or to restrict particular build statements which you know perform poorly when run concurrently.

The Ninja feature was originally motivated by Chromium builds switching to Ninja and wanting to support distributed builds, in which there might be capacity to spawn way more compile jobs in parallel since they can be run on distributed build nodes, but link jobs, needing to be run on the local machine, would want a lower limit.

If this were implemented, one could imagine a further step whereby cargo could estimate how heavy individual linker invocations are by the number of crates they link, and attempt to set a reasonable default value based on that and the amount of available system memory.

luser avatar Feb 09 '21 20:02 luser

I believe this would also be useful for people using sccache in distributed compilation mode, as they could have an exaggerated version of this problem similar to what's described in that message about Chromium, with more build capacity for compiling than linking.

luser avatar Feb 09 '21 20:02 luser

I have no idea if this would be practical, but could cargo automatically monitor memory usage to adjust how many concurrent threads to use?

Be-ing avatar Feb 27 '21 03:02 Be-ing

I've managed to work around this by enabling swap. Linking time did not suffer visibly. On Ubuntu, I followed this guide.

levkk avatar Oct 26 '21 06:10 levkk

Proposed Solution

Adding --link-jobs option to specify number of jobs for linking. The option would default to number of parallel jobs.

Here what would help look like (-j option is displayed for comparison):

-j, --jobs <N>                Number of parallel jobs, defaults to # of CPUs
--link-jobs <N>               Number of parallel jobs for linking, defaults to # of parallel jobs

sagudev avatar Mar 16 '23 09:03 sagudev

@rustbot claim

weihanglo avatar Oct 31 '23 15:10 weihanglo

@weihanglo see also #7480

epage avatar Nov 02 '23 20:11 epage

FWIW, Cabel community had a discussion a while back: https://github.com/haskell/cabal/issues/1529

weihanglo avatar Nov 02 '23 21:11 weihanglo

Potentially the unstable rustc flag -Zno-link can separate linking phase from others (see https://github.com/rust-lang/cargo/issues/9019), and then Cargo can control the parallelism of linker invocations. Somebody needs to take a look at the status of -Zno-link/-Zlink-only in rustc (and that is very likely me).

weihanglo avatar Nov 03 '23 05:11 weihanglo

As this is focusing on the problem of OOM, I'm going to close in favor of #12912 so we keep the conversations in one place.

epage avatar Nov 03 '23 13:11 epage

FWIW, setting split-debuginfo = "packed" or "unpacked" on profile should reduce the memory usage of linker. My experiment results in half of the memory usage per invocation.

Something we might want to keep an eye on in rustc: https://github.com/rust-lang/rust/issues/48762

weihanglo avatar Nov 03 '23 21:11 weihanglo

As this is focusing on the problem of OOM, I'm going to close in favor of #12912 so we keep the conversations in one place.

I suppose, although this is a very specific problem and I'm doubtful that the generic mechanisms being discussed in that issue will really help address it.

luser avatar Nov 07 '23 14:11 luser

Thanks. Reopened as it might need both #9019 and #12912, and maybe other upstream works from rust to make it happen.

weihanglo avatar Nov 07 '23 19:11 weihanglo

FWIW, there is a --no-keep-memory flag for GNU linker. Haven't tried it but might help before we make some progress on this.

https://linux.die.net/man/1/ld

weihanglo avatar Nov 09 '23 19:11 weihanglo

https://github.com/rust-lang/rust/pull/117962 has made into nightly. It could alleviate the pain of linker OOM to some extent.

weihanglo avatar Dec 13 '23 02:12 weihanglo

FWIW, there is a --no-keep-memory flag for GNU linker. Haven't tried it but might help before we make some progress on this.

I suspect this will make performance much worse in the average case, unfortunately.

luser avatar Dec 13 '23 13:12 luser