CUDA.jl icon indicating copy to clipboard operation
CUDA.jl copied to clipboard

Use PrecompileTools to warmup CUDA.jl

Open vchuravy opened this issue 1 year ago • 4 comments

vchuravy avatar Apr 15 '24 17:04 vchuravy

So IIUC it isn't worth using the actual PTX ISA or device capability here because the inference caches are shared between CUDA subtargets, and this will prime them.

I considered whether we need a mechanism to ensure this doesn't actively use the CUDA toolkit, which would prevent use on a system without a GPU, but I think CI should already cover that: https://github.com/JuliaGPU/CUDA.jl/blob/5da4d1d0355432758b3a50c0fed1a365d8f5e403/.buildkite/pipeline.yml#L198-L226. We should check if that actually works (e.g., by using a precompile workload that does initialize CUDA and ensure that job fails).

maleadt avatar Apr 15 '24 17:04 maleadt

So IIUC it isn't worth using the actual PTX ISA or device capability here because the inference caches are shared between CUDA subtargets, and this will prime them.

Correct!

Using https://github.com/JuliaGPU/GPUCompiler.jl/pull/557#issuecomment-2062299132 this improved TTFK from 12s to 4s

vchuravy avatar Apr 17 '24 20:04 vchuravy

Codecov Report

Attention: Patch coverage is 12.50000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 59.96%. Comparing base (14de009) to head (c7f880c).

:exclamation: Current head c7f880c differs from pull request most recent head 03530f0

Please upload reports for the commit 03530f0 to get more accurate results.

Files Patch % Lines
src/precompile.jl 12.50% 7 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2325       +/-   ##
===========================================
- Coverage   73.37%   59.96%   -13.42%     
===========================================
  Files         157      156        -1     
  Lines       15197    14989      -208     
===========================================
- Hits        11151     8988     -2163     
- Misses       4046     6001     +1955     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Apr 19 '24 15:04 codecov[bot]

Fails on 1.11:

2024-09-18 10:44:13 CEST	ERROR: The following 1 direct dependency failed to precompile:
2024-09-18 10:44:13 CEST	
2024-09-18 10:44:13 CEST	CUDA --code-coverage=@/var/lib/buildkite-agent/builds/gpuci-7/julialang/cuda-dot-jl --color=yes --check-bounds=yes --warn-overwrite=yes --depwarn=yes --inline=yes --startup-file=no --track-allocation=none
2024-09-18 10:44:13 CEST	
2024-09-18 10:44:13 CEST	Failed to precompile CUDA [052768ef-5323-5732-b1bb-66c8b64840ba] to "/root/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/compiled/v1.11/CUDA/jl_aa67nH".
2024-09-18 10:44:13 CEST	LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.membar.sys

maleadt avatar Sep 18 '24 09:09 maleadt