Use PrecompileTools to warmup CUDA.jl
So IIUC it isn't worth using the actual PTX ISA or device capability here because the inference caches are shared between CUDA subtargets, and this will prime them.
I considered whether we need a mechanism to ensure this doesn't actively use the CUDA toolkit, which would prevent use on a system without a GPU, but I think CI should already cover that: https://github.com/JuliaGPU/CUDA.jl/blob/5da4d1d0355432758b3a50c0fed1a365d8f5e403/.buildkite/pipeline.yml#L198-L226. We should check if that actually works (e.g., by using a precompile workload that does initialize CUDA and ensure that job fails).
So IIUC it isn't worth using the actual PTX ISA or device capability here because the inference caches are shared between CUDA subtargets, and this will prime them.
Correct!
Using https://github.com/JuliaGPU/GPUCompiler.jl/pull/557#issuecomment-2062299132 this improved TTFK from 12s to 4s
Codecov Report
Attention: Patch coverage is 12.50000% with 7 lines in your changes missing coverage. Please review.
Project coverage is 59.96%. Comparing base (
14de009) to head (c7f880c).
:exclamation: Current head c7f880c differs from pull request most recent head 03530f0
Please upload reports for the commit 03530f0 to get more accurate results.
| Files | Patch % | Lines |
|---|---|---|
| src/precompile.jl | 12.50% | 7 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #2325 +/- ##
===========================================
- Coverage 73.37% 59.96% -13.42%
===========================================
Files 157 156 -1
Lines 15197 14989 -208
===========================================
- Hits 11151 8988 -2163
- Misses 4046 6001 +1955
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Fails on 1.11:
2024-09-18 10:44:13 CEST ERROR: The following 1 direct dependency failed to precompile:
2024-09-18 10:44:13 CEST
2024-09-18 10:44:13 CEST CUDA --code-coverage=@/var/lib/buildkite-agent/builds/gpuci-7/julialang/cuda-dot-jl --color=yes --check-bounds=yes --warn-overwrite=yes --depwarn=yes --inline=yes --startup-file=no --track-allocation=none
2024-09-18 10:44:13 CEST
2024-09-18 10:44:13 CEST Failed to precompile CUDA [052768ef-5323-5732-b1bb-66c8b64840ba] to "/root/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/compiled/v1.11/CUDA/jl_aa67nH".
2024-09-18 10:44:13 CEST LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.membar.sys