cugraph icon indicating copy to clipboard operation
cugraph copied to clipboard

[EXP] Enable cuGraph workflow with Ray GPU Cluster

Open VibhuJawa opened this issue 1 year ago • 0 comments

Enable cuGraph workflow with Ray GPU Cluster

Currently, we support using PyTorch DDP with RAFT along with dask.

See example: https://github.com/rapidsai/cugraph-gnn/blob/e6000e53f7b1a6bb0834d69e8d54a5af16583289/python/cugraph-pyg/cugraph_pyg/tests/loader/test_neighbor_loader_mg.py#L34-L58

We should similarly explore enabling this with a Ray GPU cluster

This involves using cugraph_nccl_comms in a Ray setting instead of PyTorch DDP / Dask

https://github.com/rapidsai/cugraph/blob/1ef3f56eb19b9b0190df3b038ed790154aefd568/python/cugraph/cugraph/gnn/comms/cugraph_nccl_comms.py#L53-L72

Workflow to Test:

  1. Set up a Ray GPU cluster (@ayushdg , to share scripts for this)
  2. Set up comms using Ray cluster.
  3. Create a pylibcugraph.MGGraph similar to:

https://github.com/rapidsai/cugraph-gnn/blob/e6000e53f7b1a6bb0834d69e8d54a5af16583289/python/cugraph-pyg/cugraph_pyg/data/graph_store.py#L141-L166

  1. Call Connected Components, similar to: https://github.com/rapidsai/cugraph/blob/d92c257acc88522e775850c2166cd723321caf69/python/cugraph/cugraph/dask/components/connectivity.py#L99-L120

  2. Write results to a parquet file.

CC: @BradReesWork , @quasiben , @ayushdg , @randerzander

VibhuJawa avatar Aug 14 '24 20:08 VibhuJawa