julia icon indicating copy to clipboard operation
julia copied to clipboard

package precompilation oversubscribes cpus

Open JeffBezanson opened this issue 6 months ago • 7 comments

Now that we can do multi-threaded codegen, with package precompilation using all available cpus I think we are running too many threads, especially since they tend to use a fair amount of memory. I have seen this make my system unresponsive and sometimes trigger the oomkiller.

First, I suggest precompilation use cpu_threads/2 like codegen does, which is better when you have hyperthreading, and even if you don't, people will appreciate not using their whole system. Another option is to give precompilation some way to tell julia not to do parallel codegen. But that seems worse since those threads don't use as much memory as a process, and it would get complicated since we'd still want multithreaded codegen e.g. if you're only compiling one package.

JeffBezanson avatar May 30 '25 21:05 JeffBezanson

We could consider using a jobserver like make.

Keno avatar May 30 '25 21:05 Keno

Prototype of jobserver in https://github.com/JuliaLang/julia/issues/52122#issue-1988589551

giordano avatar May 30 '25 22:05 giordano

We'll need an alternate implementation on Windows, I guess?

DilumAluthge avatar May 31 '25 01:05 DilumAluthge

I think it would make sense to actually make it compatible with the make jobserver protocol (which uses named semaphores on windows). Not sure you'd run precompile in a male file that often, but perhaps you might run some make jobserver aware clients during precompile).

Keno avatar May 31 '25 01:05 Keno

Whatever the final implementation looks like, please consider making it CFS aware - ref https://github.com/JuliaLang/julia/issues/46226.

Seelengrab avatar May 31 '25 03:05 Seelengrab

A jobserver sounds good.

However I think in practice the oversubscription doesn't happen much because the sysimage generation is usually <40% of the precompile job, so the chance of those overlapping is relatively small, and for small pkgimages that time is an even smaller % and image gen parallelism is disabled for small images.

I think it's mostly an issue when an env is being precompiled that has multiple independent long tails. Like GLMakie and DiffEq. In those cases I have wondered if we could crudely reduce the number image threads.

So start as it is and back it off as the tree advances.

But that doesn't account for multiple julia processes running at the same time.. but basically nothing in Julia does that. Pidlocks on precompilation work kind of does if there's no further work to be done, but if there is it oversubscribes.

So a jobserver sounds good..

IanButterworth avatar Jun 01 '25 11:06 IanButterworth

I just realized I mistook what was being talked about here.

There are two numbers that affect subscription during parallel precompilation:

  • JULIA_NUM_PRECOMPILE_TASKS - The number of packages precompiled in parallel - Defaults to Sys.CPU_THREADS + 1 or Sys.CPU_THREADS / 2 on windows
  • JULIA_IMAGE_THREADS - The number of threads each worker can start when generating the pkgimage - defaults to jl_cpu_threads() / 2 (but a single thread is used for small pkgimages)

From experience, I think the oversubscription effect is mainly felt when multiple larger packages are being precompiled in parallel and all hit their image generation phase at the same time, where they each are allowed to spawn jl_cpu_threads() / 2 threads.

A job server for the image generation phase is what I took this discussion to be about, which I'm in support of.

One wrinkle though is that I'm not sure we could easily grant a worker to expand back up when another worker finishes, as the number of threads to use are dictated at the start of the image generation phase.

The crude thing can be done by Base.precompile_pkgs by setting JULIA_IMAGE_THREADS dynamically based on the number of running jobs at the time the worker is spawned... but I think that risks being a bit too conservative as who knows how many workers will be running their imaging phase at the same time.. that might slow some peoples precompiling down.

IanButterworth avatar Jun 10 '25 21:06 IanButterworth

If oversubscription is caused by llvm codegen, this might be solved upstream if they implement the jobserver as suggested at https://discourse.llvm.org/t/rfc-adding-gnu-make-jobserver-support-to-llvm-for-coordinated-parallelism/87034. Implementing it in Julia would still be useful for the non-llvm parts, and jobserver should be composable, so it wouldn't be wasted effort.

giordano avatar Jun 24 '25 22:06 giordano