julia icon indicating copy to clipboard operation
julia copied to clipboard

Multiversioning is order dependent?

Open simonbyrne opened this issue 2 years ago • 5 comments

I'm on a HPC system with a few different architectures:

Login node is skylake

julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 1 on 32 virtual cores
Environment:
  LD_LIBRARY_PATH = /central/software/julia/1.9.0/lib:/central/software/CUDA/11.8/lib64:/central/software/CUDA/11.8/extras/CUPTI/lib64:/central/software/CUDA/11.8/targets/x86_64-linux/lib
  LD_RUN_PATH = /central/software/CUDA/11.8/lib64:/central/software/CUDA/11.8/extras/CUPTI/lib64:/central/software/CUDA/11.8/targets/x86_64-linux/lib

and a broadwell compute node

julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 28 × Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, broadwell)
  Threads: 1 on 28 virtual cores
Environment:
  LD_LIBRARY_PATH = /central/software/CUDA/11.8/lib64:/central/software/CUDA/11.8/extras/CUPTI/lib64:/central/software/CUDA/11.8/targets/x86_64-linux/lib:/central/software/julia/1.9.0/lib:/central/slurm/install/current/lib/
  LD_RUN_PATH = /central/software/CUDA/11.8/lib64:/central/software/CUDA/11.8/extras/CUPTI/lib64:/central/software/CUDA/11.8/targets/x86_64-linux/lib

I'm calling Pkg.precompile() on the login node, then using CUDA on the compute node.

1. the default

If I don't set anything, then loading CUDA on the compute node will trigger precompilation again. Setting JULIA_DEBUG=all, I get the following warning

┌ Debug: Rejecting cache file /home/spjbyrne/.julia/compiled/v1.9/CUDA/oWw5k_OHRW8.ji for CUDA [052768ef-5323-5732-b1bb-66c8b64840ba] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2706
┌ Debug: Precompiling CUDA [052768ef-5323-5732-b1bb-66c8b64840ba]
└ @ Base loading.jl:2140

(and similar for CUDA.jl's deps)

2. setting JULIA_CPU_TARGET=broadwell

Since broadwell is supported by both nodes, this seems to work as intended. I do get the following warning:

┌ Debug: Rejecting cache file /central/software/julia/1.9.0/share/julia/compiled/v1.9/Statistics/ERcPL_Stp2R.ji for Statistics [10745b16-79ce-11e8-11f9-7d13ad32a3b2] since the flags are mismatched
│   current session: use_pkgimages = true, debug_level = 1, check_bounds = 0, inline = true, opt_level = 2
│   cache file:      use_pkgimages = true, debug_level = 1, check_bounds = 1, inline = true, opt_level = 2
└ @ Base loading.jl:2690

but it doesn't appear to cause any issues (perhaps since Statistics isn't built as a pkgimage?).

3. setting JULIA_CPU_TARGET='skylake;broadwell'

This does not appear to work, and gives the same behavior as 1:

┌ Debug: Rejecting cache file /home/spjbyrne/.julia/compiled/v1.9/CUDA/oWw5k_Qcjfa.ji for CUDA [052768ef-5323-5732-b1bb-66c8b64840ba] since pkgimage can't be loaded on this target
└ @ Base loading.jl:2706
┌ Debug: Precompiling CUDA [052768ef-5323-5732-b1bb-66c8b64840ba]
└ @ Base loading.jl:2140

(and similar for dependencies)

cc @vchuravy

simonbyrne avatar Jun 12 '23 21:06 simonbyrne

So 2. is not an issue, we just checked another cache file on the way.

For 3. Could you try: JULIA_CPU_TARGET='broadwell;skylake'?

vchuravy avatar Jun 15 '23 12:06 vchuravy

One thing we discussed is to safe the cpu_target string of the sysimg and use that as a default for pkgimages. This would mitigate 1., but would increase cache-time.

@simonbyrne for you this would be identical to setting: generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1).

x-ref: https://github.com/JuliaCI/julia-buildkite/issues/298

vchuravy avatar Jun 15 '23 13:06 vchuravy

For 3. Could you try: JULIA_CPU_TARGET='broadwell;skylake'?

Yes, that appears to work (in that it doesn't trigger recompilation).

this would be identical to setting: generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1).

That also works.

simonbyrne avatar Jun 15 '23 17:06 simonbyrne

So from https://docs.julialang.org/en/v1/devdocs/sysimg/#Specifying-multiple-system-image-targets

By default, only functions that are the most likely to benefit from the microarchitecture features will be cloned.

and

By default, a partially cloned (i.e. not clone_all) target will use functions from the default target (first one specified) if a function is not cloned.

E.g. 'skylake;broadwell' Takes skylake as the base-image and then it may compile some functions for broadwell as an extension. Which leads to something that is not loadable on broadwell.

@pchintalapudi raised the point offline that this is a non-ideal default and we probably should make clone_all the default.

vchuravy avatar Jun 18 '23 21:06 vchuravy

This may be related to #54464 where we found that JULIA_CPU_TARGET basically only compiles for the first target (which would explain why it's order-dependent).

giordano avatar May 14 '24 21:05 giordano

IIUC, the order-dependence in this issue is intentional. You can argue clone_all should be the default, but I think the current behavior makes sense. #54464 looks like a different issue.

JeffBezanson avatar May 31 '24 19:05 JeffBezanson