cugraph
cugraph copied to clipboard
[EXP] Enable cuGraph workflow with Ray GPU Cluster
Enable cuGraph workflow with Ray GPU Cluster
Currently, we support using PyTorch DDP with RAFT along with dask.
See example: https://github.com/rapidsai/cugraph-gnn/blob/e6000e53f7b1a6bb0834d69e8d54a5af16583289/python/cugraph-pyg/cugraph_pyg/tests/loader/test_neighbor_loader_mg.py#L34-L58
We should similarly explore enabling this with a Ray GPU cluster
This involves using cugraph_nccl_comms in a Ray setting instead of PyTorch DDP / Dask
https://github.com/rapidsai/cugraph/blob/1ef3f56eb19b9b0190df3b038ed790154aefd568/python/cugraph/cugraph/gnn/comms/cugraph_nccl_comms.py#L53-L72
Workflow to Test:
- Set up a Ray GPU cluster (@ayushdg , to share scripts for this)
- Set up comms using Ray cluster.
- Create a
pylibcugraph.MGGraphsimilar to:
https://github.com/rapidsai/cugraph-gnn/blob/e6000e53f7b1a6bb0834d69e8d54a5af16583289/python/cugraph-pyg/cugraph_pyg/data/graph_store.py#L141-L166
-
Call Connected Components, similar to: https://github.com/rapidsai/cugraph/blob/d92c257acc88522e775850c2166cd723321caf69/python/cugraph/cugraph/dask/components/connectivity.py#L99-L120
-
Write results to a parquet file.
CC: @BradReesWork , @quasiben , @ayushdg , @randerzander