cugraph icon indicating copy to clipboard operation
cugraph copied to clipboard

Switch to conda-forge PyTorch, install only conda-forge libs (pyg-lib, torch-scatter)

Open alexbarghi-nv opened this issue 10 months ago • 7 comments

alexbarghi-nv avatar Apr 15 '24 18:04 alexbarghi-nv

With this PR, do we still need to skip tests on CUDA 12? I think there should be packages available for CUDA 12 on the new channels. If not, we should identify what is missing.

 RAPIDS logger » [04/17/24 20:09:23]
┌─────────────────────────────────────────────────┐
|    skipping cugraph_dgl pytest on CUDA!=11.8    |
└─────────────────────────────────────────────────┘


RAPIDS logger » [04/17/24 20:09:23]
┌─────────────────────────────────────────────────┐
|    skipping cugraph_pyg pytest on CUDA!=11.8    |
└─────────────────────────────────────────────────┘


RAPIDS logger » [04/17/24 20:09:23]
┌─────────────────────────────────────────────────────────┐
|    skipping cugraph-equivariant pytest on CUDA!=11.8    |
└─────────────────────────────────────────────────────────┘

bdice avatar Apr 17 '24 20:04 bdice

With this PR, do we still need to skip tests on CUDA 12? I think there should be packages available for CUDA 12 on the new channels. If not, we should identify what is missing.

 RAPIDS logger » [04/17/24 20:09:23]
┌─────────────────────────────────────────────────┐
|    skipping cugraph_dgl pytest on CUDA!=11.8    |
└─────────────────────────────────────────────────┘


RAPIDS logger » [04/17/24 20:09:23]
┌─────────────────────────────────────────────────┐
|    skipping cugraph_pyg pytest on CUDA!=11.8    |
└─────────────────────────────────────────────────┘


RAPIDS logger » [04/17/24 20:09:23]
┌─────────────────────────────────────────────────────────┐
|    skipping cugraph-equivariant pytest on CUDA!=11.8    |
└─────────────────────────────────────────────────────────┘

I just reenabled CUDA 12 testing. Let's see what happens.

alexbarghi-nv avatar Apr 19 '24 17:04 alexbarghi-nv

CUDA 12 testing still seems to be blocked because our CI is using CUDA 12.2 but there are only CUDA 12.1 packages for PyG, DGL and PyTorch available on conda.

alexbarghi-nv avatar Apr 23 '24 14:04 alexbarghi-nv

Think the packages should be compatible with CUDA 12.2. Let's give this another try

jakirkham avatar May 01 '24 18:05 jakirkham

@jakirkham both the PyG and DGL tests for CUDA 12.2 failed because of the CUDA version incompatibility.

alexbarghi-nv avatar May 02 '24 14:05 alexbarghi-nv

Mamba wasn't able to install the right packages because we had a conflict between packages for CUDA 12.2 and CUDA 12.1. I'm not sure how we can resolve this in conda.

alexbarghi-nv avatar May 02 '24 14:05 alexbarghi-nv

@alexbarghi-nv @jakirkham Perhaps this can help (copying from an internal discussion). You can just install CUDA 12.1 from conda-forge to align with PyTorch's CUDA version.

Try this. PyTorch only has CUDA 12.1 builds on the pytorch channel, so you probably need to match to that. RAPIDS will work with CUDA 12.1, even though the install page only lists 12.0 and 12.2.

mamba create -n pytorch-rapids-cuda-12.1 rapids pytorch torchvision torchaudio pytorch-cuda=12.1 cuda-version=12.1 -c rapidsai -c pytorch -c conda-forge -c nvidia

I verified that both cudf and torch can be imported and used:

import cudf
import torch

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
arr = torch.ones(100, device=device)
print(arr)

print(cudf.Series(arr))

bdice avatar May 02 '24 17:05 bdice

Closing this; I think we need to wait until the repository migration, which will pin the RAPIDS version and resolve that problem.

alexbarghi-nv avatar Jun 03 '24 16:06 alexbarghi-nv