Distributed.jl icon indicating copy to clipboard operation
Distributed.jl copied to clipboard

need precompile statements re-enabled for `addprocs` (with PR)

Open non-Jedi opened this issue 5 years ago • 8 comments

As discovered in https://discourse.julialang.org/t/help-with-binary-trees-benchmark-games-example/37307/13

❯ hyperfine -w1 "julia -p4 -E 'using Distributed; nprocs()'" "julia -E 'using Distributed; addprocs(); nprocs()'"
Benchmark JuliaLang/julia#1: julia -p4 -E 'using Distributed; nprocs()'
  Time (mean ± σ):      2.040 s ±  0.010 s    [User: 5.563 s, System: 0.773 s]
  Range (min … max):    2.024 s …  2.054 s    10 runs
 
Benchmark JuliaLang/julia#2: julia -E 'using Distributed; addprocs(); nprocs()'
  Time (mean ± σ):      1.785 s ±  0.014 s    [User: 5.337 s, System: 0.756 s]
  Range (min … max):    1.765 s …  1.816 s    10 runs
 
Summary
  'julia -E 'using Distributed; addprocs(); nprocs()'' ran
    1.14 ± 0.01 times faster than 'julia -p4 -E 'using Distributed; nprocs()''

Is there a reason spawning the extra processes with addprocs() is necessarily faster than spawning them with -p command-line argument?

non-Jedi avatar Apr 10 '20 17:04 non-Jedi

Probably because theh addprocs version is already compiled; https://github.com/JuliaLang/julia/blob/0c284839fef6c8c153edc01fddfa37a9f5ac6752/contrib/generate_precompile.jl#L44-L45.

fredrikekre avatar Apr 11 '20 20:04 fredrikekre

@fredrikekre did you close because there's no way to get similar speed for -p4?

non-Jedi avatar Apr 15 '20 22:04 non-Jedi

It doesn't seem like this should have been closed. It should be as fast, and -p needed for it to be in the hands of the user, not programmer. See also: https://github.com/JuliaLang/julia/issues/35830#issuecomment-626825539

PallHaraldsson avatar May 26 '20 13:05 PallHaraldsson

Should that issue be closed and this one opened then?

KristofferC avatar May 26 '20 14:05 KristofferC

No, keep both open. Mine is not a dup (about scalability), while slightly different, the cause may or may not be the same.

First, I saw no difference, for this issue, on Julia 1.0 using defaults, nor on most recent ASSUMING these settings only:

$ hyperfine -w1 "~/julia-1.6.0-DEV-8f512f3f6d/bin/julia --compile=min -O0 --startup-file=no -E 'using Distributed; addprocs(4);'"
Benchmark JuliaLang/julia#1: ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia  --compile=min -O0 --startup-file=no -E 'using Distributed; addprocs(4);'
  Time (mean ± σ):      1.320 s ±  0.011 s    [User: 3.226 s, System: 2.114 s]
  Range (min … max):    1.304 s …  1.333 s    10 runs
 
$ hyperfine -w1 "~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -p4 --compile=min --startup-file=no -O0 -E ''"
Benchmark JuliaLang/julia#1: ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -p4 --compile=min --startup-file=no -O0 -E ''
  Time (mean ± σ):      1.323 s ±  0.008 s    [User: 3.259 s, System: 2.020 s]
  Range (min … max):    1.309 s …  1.335 s    10 runs

For default settings, there is a difference, and even with -O0 min..max ranges do not overlap, so as I've seen that setting eliminate invalidations, I would say those are implicated?

PallHaraldsson avatar May 26 '20 14:05 PallHaraldsson

Now performance is switched, so problem solved!

vtjnash@deepsea4:~/julia$ hyperfine -w1 "./julia -p4 -E 'using Distributed; nprocs()'" "./julia -E 'using Distributed; addprocs(); nprocs()'"
Benchmark 1: ./julia -p4 -E 'using Distributed; nprocs()'
  Time (mean ± σ):      8.952 s ±  1.129 s    [User: 26.344 s, System: 0.740 s]
  Range (min … max):    8.058 s … 10.398 s    10 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (10.222 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Benchmark 2: ./julia -E 'using Distributed; addprocs(); nprocs()'
  Time (mean ± σ):     14.585 s ±  0.315 s    [User: 62.846 s, System: 2.424 s]
  Range (min … max):   14.057 s … 14.948 s    10 runs
 
Summary
  './julia -p4 -E 'using Distributed; nprocs()'' ran
    1.63 ± 0.21 times faster than './julia -E 'using Distributed; addprocs(); nprocs()''

Clearly needs more precompile statements, now that Distributed is a separate stdlib that is much more reasonable then when it was included in the default image.

vtjnash avatar Feb 11 '24 00:02 vtjnash

Code at https://github.com/JuliaLang/julia/pull/42156

vtjnash avatar Feb 11 '24 00:02 vtjnash

@KristofferC Should we go ahead and enable precompile?

ViralBShah avatar Feb 11 '24 02:02 ViralBShah