XGBoost.jl Huge import latency caused by `Term`, `GPUArrays`, and `CUDA`

julia> @time_imports using XGBoost
     17.5 ms  SparseMatricesCSR
     17.1 ms  AbstractTrees
      7.5 ms  OrderedCollections
      0.1 ms  SnoopPrecompile
     68.5 ms  Parsers 11.99% compilation time
     12.0 ms  StructTypes
     71.2 ms  JSON3
      0.2 ms  DataValueInterfaces
      1.2 ms  DataAPI
      0.1 ms  IteratorInterfaceExtensions
      0.1 ms  TableTraits
     13.6 ms  Tables
      3.7 ms  DocStringExtensions 75.50% compilation time
     62.0 ms  Highlights
      1.3 ms  MyterialColors
      0.1 ms  UnPack
      0.3 ms  Parameters
      3.1 ms  ProgressLogging
      1.2 ms  UnicodeFun
      1.8 ms  CodeTracking
    958.0 ms  Term
      4.1 ms  CEnum
     20.1 ms  Preferences
      0.4 ms  JLLWrappers
      5.7 ms  LLVMExtra_jll 59.12% compilation time
    186.4 ms  LLVM 41.51% compilation time (100% recompilation)
      0.4 ms  ExprTools
    132.5 ms  TimerOutputs 11.86% compilation time
    551.1 ms  GPUCompiler 1.29% compilation time
      0.4 ms  Adapt
      0.1 ms  Reexport
      3.5 ms  GPUArraysCore
    876.0 ms  GPUArrays
      0.6 ms  Requires
      8.6 ms  BFloat16s
      0.7 ms  CompilerSupportLibraries_jll
     32.6 ms  RandomNumbers 33.48% compilation time (13% recompilation)
      7.7 ms  Random123
      1.2 ms  Compat
     82.5 ms  ChainRulesCore
     11.3 ms  AbstractFFTs
   1625.2 ms  CUDA 0.75% compilation time
     16.4 ms  CUDA_Driver_jll 94.73% compilation time
      0.2 ms  CUDA_Runtime_jll
      2.4 ms  XGBoost_jll
      9.5 ms  XGBoost

Jan 10 '23 05:01 Moelf

I suppose an argument can be made for removing Term, but how would you propose to remove CUDA without removing important GPU features?

Jan 10 '23 14:01 ExpandingMan

I'm wondering if you just need GPUArrays.jl for interface

Jan 10 '23 15:01 Moelf

Ah, that's a good point... technically this supports CuArray and not any other type of GPU array, but I suppose it wouldn't be so bad to use GPUArray and let it crash in some other way of somebody tries to use something other than a CuArray (pretty sure libxgboost will crash somewhere).

Anyone else have opinions on this? I'm definitely willing to do the PR and make the change if there is consensus, but I'm not completely confident that nobody will take issue with it.

Jan 10 '23 16:01 ExpandingMan

yeah, thanks for the GPU support, I'm just experience slow loading time on my poor cluster login node so no rush

Jan 10 '23 16:01 Moelf

IMO even GPUArrays alone seems to increase loading times quite significantly. It's nice to have GPU support but personally usually I do not have access to dedicated GPUs, so to me it rather seems to be an extension than a basic part of the package (also highlighted by the fact that GPU support was added only recently). Maybe it would be better to use weak dependencies on Julia >= 1.9 and Requires on older Julia versions (https://pkgdocs.julialang.org/dev/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)).

Jan 11 '23 01:01 devmotion

what about https://github.com/JuliaGPU/GPUArrays.jl/tree/master/lib/GPUArraysCore

Jan 11 '23 01:01 Moelf

It's nice to have GPU support but personally usually I do not have access to dedicated GPUs, so to me it rather seems

yeah I agree with this

Jan 11 '23 01:01 Moelf

I'm not sure exactly what can be done with weak dependencies, but I'm certainly open to exploring it once 1.9 is released.

Personally I'm not too fond of the argument that stuff should be removed just because it has to compile. Compilation is just part of life, it's an issue for the compiler, not individual packages. That dependencies should be trimmed where possible seems like a significantly better argument to me.

That said, I'm at least open to all the specific proposals made here: yes using GPUArrays only sounds good to me, I'm happy to embrace weak dependencies, and arguably the functionality I'm using Term for belongs in another package. On the other hand I don't like the idea of having a separate package for GPU support at all.

Of course I'm not the only maintainer of this, so my opinion is hardly authoritative.

Jan 11 '23 14:01 ExpandingMan

I too am having issues with unexpectedly depending on CUDA. In my case I am using PackageCompiler - here's a (cut down) snippet:

julia> using PackageCompiler; create_app(".", "dist")'
PackageCompiler: bundled artifacts:
  ├── CUDA_Runtime_jll - 1.858 GiB
  └── ...
  Total artifact file size: 2.134 GiB

I would really like my bundle to be 2GB smaller. I'm assuming just using GPUArrays will at least fix the worst of the problem for now, and we can look at weak dependencies later?

Feb 01 '23 05:02 andyferris

CUDA_Runtime_jll - 1.858 GiB

Are you running on a machine with CUDA 11? If so, you might be able to eliminate the large CUDA artifacts by switching. The CUDA_Runtime_jll dependency comes from XGBoost_jll when it is build with CUDA. The non-cuda builds do not have it.

https://github.com/JuliaPackaging/Yggdrasil/blob/2b49bcda328332a02b0d6ee3926bafb3f031d026/X/XGBoost/build_tarballs.jl#L92

I wonder if we would be able to handle the CUDA deps with the new conditional package loading: https://pkgdocs.julialang.org/dev/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)

Feb 01 '23 06:02 tylerjthomas9

I wonder if we would be able to handle the CUDA deps with the new conditional package loading:

That was what I was referring to with weak dependencies in https://github.com/dmlc/XGBoost.jl/issues/156#issuecomment-1378105194 :slightly_smiling_face: It's great, I'm already using it in a few packages, but it requires Julia 1.9 - if one wants to support the conditional features on older Julia versions one either has to add the weak dependencies as hard dependencies on these Julia versions or use Requires (which does not support precompilation). In the case of XGBoost, the most natural approach would seem to keep the already existing dependencies as hard dependencies on Julia < 1.9.

Feb 01 '23 08:02 devmotion

Just to add that at RAI we're in exactly the same situation @andyferris mentioned above, where we're using PackageCompiler and do not want to include CUDA_Runtime_jll or CUDA_Driver_jll.

But given these deps seem to have been added in XGBoost_jll at v1.7.1+1 i don't think there's even a way to pin the version to avoid getting these? And so what we're having to do is just manually not check-in the XGBoost-related changes to the Manifest.toml, which is really disruptive and error-prone

e.g. when updating a unrelated package we see things like

(Repo) pkg> add --preserve=all XUnit @1.1.5
   Resolving package versions...
    Updating `~/Project.toml`
  [3e3c03f2] ↑ XUnit v1.1.4 ⇒ v1.1.5
    Updating `~/Manifest.toml`
  [3e3c03f2] ↑ XUnit v1.1.4 ⇒ v1.1.5
  [4ee394cb] + CUDA_Driver_jll v0.3.0+0
  [76a88914] + CUDA_Runtime_jll v0.3.0+2
⌃ [a5c6f535] ↑ XGBoost_jll v1.7.1+0 ⇒ v1.7.1+1
        Info Packages marked with ⌃ have new versions available and may be upgradable.
Precompiling project...
  ✓ XGBoost_jll
  ✓ XUnit
  ✓ XGBoost
  ✓ Repo
  4 dependencies successfully precompiled in 42 seconds. 259 already precompiled. 1 skipped during auto due to previous errors.

Feb 03 '23 12:02 nickrobinson251

We will definitely demote both CUDA and Term to weak dependencies once 1.9 is officially released. In the meantime, if someone wants to replace CUDA with GPUArrays, we can probably make that work. Otherwise, it looks like it's just going to have to wait. Suffice it to say, this problem is hardly unique to this package.

Feb 03 '23 15:02 ExpandingMan

i don't think there's even a way to pin the version to avoid getting these?

@nickrobinson251 I did manage to pin both XGBoost at 2.0.2 and XGBoost_jll at 1.6.2+0, and that seemed to work for me.

Looking forward to Julia 1.9 still :)

Feb 15 '23 04:02 andyferris

As a follow up of this, I see there are some weakdep stuff in the Project.toml, and when installing on Julia 1.9.2 I get this:

(@v1.9) pkg> add XGBoost
   Resolving package versions...
    Updating `~/.julia/environments/v1.9/Project.toml`
  [009559a3] + XGBoost v2.3.0
    Updating `~/.julia/environments/v1.9/Manifest.toml`
  [fa961155] + CEnum v0.4.2
  [9a962f9c] + DataAPI v1.15.0
  [e2d170a0] + DataValueInterfaces v1.0.0
  [82899510] + IteratorInterfaceExtensions v1.0.0
  [692b3bcd] + JLLWrappers v1.4.1
  [0f8b85d8] + JSON3 v1.13.1
  [69de0a69] + Parsers v2.7.1
  [a0a7dd2c] + SparseMatricesCSR v0.6.7
  [856f2bd8] + StructTypes v1.10.0
  [3783bdb8] + TableTraits v1.0.1
  [bd369af6] + Tables v1.10.1
  [009559a3] + XGBoost v2.3.0
  [4ee394cb] + CUDA_Driver_jll v0.5.0+1
  [76a88914] + CUDA_Runtime_jll v0.6.0+0
  [1d63c593] + LLVMOpenMP_jll v15.0.4+0
  [a5c6f535] + XGBoost_jll v1.7.5+2
  [4af54fe1] + LazyArtifacts
  [37e2e46d] + LinearAlgebra
  [a63ad114] + Mmap
  [2f01184e] + SparseArrays
  [10745b16] + Statistics v1.9.0
  [4607b0f0] + SuiteSparse
  [8dfed614] + Test
  [e66e0078] + CompilerSupportLibraries_jll v1.0.5+0
  [4536629a] + OpenBLAS_jll v0.3.21+4
  [bea87d4a] + SuiteSparse_jll v5.10.1+6
  [8e850b90] + libblastrampoline_jll v5.8.0+0

I see that Term and CUDA aren't installed, which is great, but there seems to be a bunch of GPGPU stuff downloaded regardless (this is a lot of stuff for a docker image intended for a machine that doesn't even have an nVidia GPU). Do we need to do the weakdep thing in XGBoost_jll too? (Will that work? Will the XGBoost_jll .so/.dll load and run fine in CPU mode if it can't find the CUDA .so/.dll files?)

Jul 12 '23 01:07 andyferris

Yes, the JLL still pulls in CUDA JLL dependencies. I asked about weakdeps/optional dependencies and GPU-/non-GPU binaries in some issue in Yggdrasil a while ago but there does not seem to be a solution to this problem yet. (Some other libraries are built both in a GPU and non-GPU version on Yggdrasil but last time I checked there was no Julia package actually tried to combine them - also just depending on both would still pull in all the undesired binaries...).

Jul 12 '23 08:07 devmotion

Ah I see - thank you.

Jul 13 '23 04:07 andyferris

XGBoost.jl XGBoost.jl copied to clipboard

Huge import latency caused by `Term`, `GPUArrays`, and `CUDA`

XGBoost.jl
XGBoost.jl copied to clipboard