[BUG] Nightly CI issue: CUDA 11.4 jobs were running with CUDA 11.8 when nccl wasn't available
NCCL 2.22.3.1 in conda-forge was not available for CUDA < 11.8 until yesterday, which was reflected in cuML's CI by failing all CUDA 11.4 jobs until today. But RAFT's CUDA 11.4 CI was passing regardless (which confused me for a while).
Checking the jobs, they were installing cuda-version 11.8 and corresponding packages, from this CUDA 11.4 log for example, the following snippets show the issue when installing the downloaded artifacts
Upgrade:
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
- cuda-version 11.4 hfb901f2_3 conda-forge Cached
+ cuda-version 11.8 h70ddcb2_3 conda-forge 21kB
- cudatoolkit 11.4.3 h39f8164_13 conda-forge Cached
+ cudatoolkit 11.8.0 h4ba93d1_13 conda-forge 716MB
which should not be happening on CUDA 11.4 jobs of course. I think this shouldn't be an issue now with nccl, but any other package could cause a situation like this, This could make things fail silently in the future and catch us by surprise, eliminating the point of having 11.4 jobs in nightly CI.