cugraph
cugraph copied to clipboard
[QST]: MST cuGraph RuntimeError: RAFT failure, how to fix?
What is your question?
I'm trying to conduct a Steiner Tree approximation with several terminal nodes (and subgraphs I need to analyze). However, whenever I try to iterate through, it fails at MST despite the graph being Undirected: RuntimeError: RAFT failure at file=/scratch/user/anaconda3/envs/rapids-24.04/include/raft/sparse/solver/detail/mst_solver_inl.cuh line=152:
The code works if I run through terminal nodes for each subgraph (one at a time). I already checked for self-loops, isolated nodes, or disconnected components in the big graph data, and they don't exist. The data types are to the Dask cugraph standard, so there is no issue there. Even if I change `DiG = cugraph.Graph(directed=False)' problem persists with the same error message. I'm not sure how to address this error... or am I missing something here? Graph data (df_graph) has 46.6M edges, directed and unweighted. My code below:
#Set working directory
import os
os.chdir('/scratch')
# Install required packages
import pandas as pd
import matplotlib.pyplot as plt
import yaml
import sys
import csv
# Install Networka
import networkx as nx
import cudf
cudf.__version__
import cugraph
cugraph.__version__
# Import Dask to utilize Muti-GPU
import dask_cudf
from dask.distributed import Client
from dask_cuda import LocalCUDACluster
import cugraph.dask as dcg
import cugraph.dask.comms.comms as Comms
from cugraph.generators.rmat import rmat
# Initialize Dask CUDA cluster and client
cluster = LocalCUDACluster()
client = Client(cluster)
# Initialize cuGraph communication
Comms.initialize(p2p=True)
columns_to_read = ['subject', 'object', 'predicate']
# Read the TSV file using Dask cuDF with specific columns
df_graph = dask_cudf.read_csv('big_graph_edge.tsv', sep='\t',
usecols=columns_to_read)
# Renaming columns to match cuGraph requirements
df_graph = df_graph.rename(columns={'subject': 'source', 'object': 'destination'})
print(df_graph.head())
# Create graph from input data
DiG = cugraph.Graph(directed=True)
DiG.from_dask_cudf_edgelist(df_graph, source = 'source', destination = 'destination' )
# Verify the graph has edges
print("Number of edges in the graph:", DiG.number_of_edges())
df = pd.read_csv('/scratch/subgraph_nodes.csv', sep = ",")
df
# Define a function to clean each c_id list
def get_terminals(c_ids):
# Remove the leading and trailing square brackets
c_ids = c_ids.strip("[]")
# Split by comma and clean each element
return [c_id.strip().strip("'") for c_id in c_ids.split(',')]
# Apply the cleaning function to the 'c_ids' column and convert to list of lists
terminals = df['c_ids'].head().apply(lambda x: get_terminals(x) if pd.notnull(x) else []).tolist()
terminals # output is list of lists
# Filter out terminals that are not in the graph
filt_subgraphs = [[t for t in terminal if DiG.has_node(t)] for terminal in terminals]
# Check if there are valid terminals
if not filt_subgraphs:
raise ValueError("None of the terminal vertices are present in the graph")
def bfs_with_cache(graph, start, cache):
if start in cache:
return cache[start]
distances = dcg.bfs(graph, start=start)
distances_df = distances.compute().to_pandas()
cache[start] = distances_df
return distances_df
def get_edges(lst,DiG,cache):
edges = []
# Find the shortest paths between all pairs of terminal vertices using BFS
for l in lst:
distances_df = bfs_with_cache(DiG,l,cache)
filtered_df = distances_df[distances_df['vertex'].isin(lst)]
for _, row in filtered_df.iterrows():
if row['vertex'] != l:
edges.append((l, row['vertex'], row['distance']))
# Create a new DataFrame to hold the edges of the shortest path graph
edges_df = pd.DataFrame(edges, columns=['source', 'destination', 'weight'])
return edges_df
def get_mst_results(edges_df):
# Create a new graph for MST purpose
mst_graph = cugraph.Graph(directed=False)
mst_graph.from_pandas_edgelist(edges_df, source='source', destination='destination', edge_attr='weight')
# Compute the MST
mst = cugraph.minimum_spanning_tree(mst_graph)
# Extract the edge list of the MST
mst_edges = mst.view_edge_list()
mst_df = mst_edges.to_pandas()
# Get number of edges and nodes
num_edges = mst_df.shape[0]
num_nodes = len(pd.concat([mst_df['src'], mst_df['dst']]).unique())
return num_edges, num_nodes
cache = {}
# need access to ids, associate edges to graph ids
#print("Graph ID:")
### Main code execution
for subgraph in filt_subgraphs:
graph_edges = get_edges(subgraph,DiG,cache)
n_edges,n_nodes = get_mst_results(graph_edges)
print(f"Number of edges in the MST: {n_edges}")
print(f"Number of nodes in the MST: {n_nodes}")
print("\n")
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[21], line 10
8 for subgraph in filt_subgraphs:
9 graph_edges = get_edges(subgraph,DiG,cache)
---> 10 n_edges,n_nodes = get_mst_results(graph_edges)
12 print(f"Number of edges in the MST: {n_edges}")
13 print(f"Number of nodes in the MST: {n_nodes}")
Cell In[16], line 7, in get_mst_results(edges_df)
4 mst_graph.from_pandas_edgelist(edges_df, source='source', destination='destination', edge_attr='weight')
6 # Compute the MST
----> 7 mst = cugraph.minimum_spanning_tree(mst_graph)
9 # Extract the edge list of the MST
10 mst_edges = mst.view_edge_list()
File /scratch/user/anaconda3/envs/rapids-24.04/lib/python3.11/site-packages/cugraph/tree/minimum_spanning_tree.py:105, in minimum_spanning_tree(G, weight, algorithm, ignore_nan)
103 return cugraph_to_nx(mst)
104 else:
--> 105 return _minimum_spanning_tree_subgraph(G)
File /scratch/user/anaconda3/envs/rapids-24.04/lib/python3.11/site-packages/cugraph/tree/minimum_spanning_tree.py:26, in _minimum_spanning_tree_subgraph(G)
24 if G.is_directed():
25 raise ValueError("input graph must be undirected")
---> 26 mst_df = minimum_spanning_tree_wrapper.minimum_spanning_tree(G)
27 if G.renumbered:
28 mst_df = G.unrenumber(mst_df, "src")
File minimum_spanning_tree_wrapper.pyx:71, in cugraph.tree.minimum_spanning_tree_wrapper.minimum_spanning_tree()
File minimum_spanning_tree_wrapper.pyx:52, in cugraph.tree.minimum_spanning_tree_wrapper.mst_double()
RuntimeError: RAFT failure at file=/scratch/user/anaconda3/envs/rapids-24.04/include/raft/sparse/solver/detail/mst_solver_inl.cuh line=152:
Code of Conduct
- [X] I agree to follow cuGraph's Code of Conduct
- [X] I have searched the open issues and have found no duplicates for this question
Is this a dataset you can share? This is not an error I've seen with some of our existing data sets. The call is somehow getting more answers than should be possible. It's probably a bug that only occurs with a specific dataset.
@ChuckHastings Thanks for the earlier help with running cugraph leiden. I am observing this issue with cugraph MST as well. It runs on the com-Orkut graph, but not on the following graphs:
- web-Stanford
- indochina-2004
- com-LiveJournal
- asia_osm
- europe_osm
- kmer_A2a
- kmer_V1r
The reported error is as follows:
Converting /home/graphwork/Data/web-Stanford.mtx to /home/graphwork/Data/web-Stanford.mtx.csv ...
Running cuGraph MST on /home/graphwork/Data/web-Stanford.mtx.csv ...
Initializing RMM pool...
Reading graph from file: /home/graphwork/Data/web-Stanford.mtx.csv
Creating cuGraph graph...
Running minimum_spanning_tree (first)...
Traceback (most recent call last):
File "/home/graphwork/Documents/MST/test-cugraph-mst/main.py", line 31, in <module>
mst = cugraph.minimum_spanning_tree(G)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/graphwork/miniconda3/envs/cugraph-env/lib/python3.11/site-packages/cugraph/tree/minimum_spanning_tree.py", line 105, in minimum_spanning_tree
return _minimum_spanning_tree_subgraph(G)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/graphwork/miniconda3/envs/cugraph-env/lib/python3.11/site-packages/cugraph/tree/minimum_spanning_tree.py", line 26, in _minimum_spanning_tree_subgraph
mst_df = minimum_spanning_tree_wrapper.minimum_spanning_tree(G)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "minimum_spanning_tree_wrapper.pyx", line 71, in cugraph.tree.minimum_spanning_tree_wrapper.minimum_spanning_tree
File "minimum_spanning_tree_wrapper.pyx", line 52, in cugraph.tree.minimum_spanning_tree_wrapper.mst_double
RuntimeError: RAFT failure at file=/home/graphwork/miniconda3/envs/cugraph-env/include/raft/sparse/solver/detail/mst_solver_inl.cuh line=152:
The output on com-Orkut (using A100) is as follows:
Converting /home/graphwork/Data/com-Orkut.mtx to /home/graphwork/Data/com-Orkut.mtx.csv ...
Running cuGraph MST on /home/graphwork/Data/com-Orkut.mtx.csv ...
Initializing RMM pool...
Reading graph from file: /home/graphwork/Data/com-Orkut.mtx.csv
Creating cuGraph graph...
Running minimum_spanning_tree (first)...
Running minimum_spanning_tree...
minimum_spanning_tree weight: 3072440.000000
minimum_spanning_tree took: 0.066725 s
Running minimum_spanning_tree...
minimum_spanning_tree weight: 3072440.000000
minimum_spanning_tree took: 0.066817 s
Running minimum_spanning_tree...
minimum_spanning_tree weight: 3072440.000000
minimum_spanning_tree took: 0.066751 s
Running minimum_spanning_tree...
minimum_spanning_tree weight: 3072440.000000
minimum_spanning_tree took: 0.066882 s
I can recreate your error with our latest code. I will be digging into this and let you know how things progress.
@wolfram77 I was able to reproduce the error you are seeing with a much smaller datasets [karate]. You can track the progress of this work in this issue where I provide explanations pertaining to this issue. In fact, you are not providing edge weights hence the implied values are 1.0 making them non distinct and a quick solution while this issue is addressed in the raft API could be to provide distinct weights as below. However, this may lead to non deterministic but valid result.
df = cudf.read_csv(datasets, sep=' ', usecols=columns_to_read)
distinct_wgt = np.random.choice(np.arange(0, 1, 0.001), size=len(df), replace=False)
df['wgt'] = distinct_wgt
G.from_cudf_edgelist(df, source='src', destination='dst', edge_attr='wgt' renumber=False)
@jnke2016 Thanks for an update on this. So, assigning random weights fixes this issue. Cool. My friend was working on this problem for massive graphs, and so I was trying this out.
https://github.com/rapidsai/raft/pull/2707
I went through the issue again. It seems you are adding random weights in the range [0, 1). This would make the new weights [1, 2). This would work, since the weight of no edge can be greater than or equal to the weight of two or more edges.
But, this assumption will break on weighted graphs, with say, all edge weights being 0.1. In such a case multiple edge connecting another vertex could be considered shorter than a single edge.
Perpahs, the easiest approach would be to consider the vertex IDs for tie breaking, instead of relying purely of edge weights. What do you think?
I went through the issue again. It seems you are adding random weights in the range [0, 1)
That's correct.
But, this assumption will break on weighted graphs, with say, all edge weights being 0.1. In such a case multiple edge connecting another vertex could be considered shorter than a single edge.
Keep in mind that the random weights are added in a way that preserves the relative order the edges hence this will work even if all edge weights are 0.1. In fact, there is a test covering this case were all weights are either 0.1 or 0.2.
Perpahs, the easiest approach would be to consider the vertex IDs for tie breaking, instead of relying purely of edge weights. What do you think
I do agree that the random weights approach exposes the implementation to several edge cases hence, our new implementation of MST(not currently on our roadmap) will likely break ties with vertex IDs.
@jnke2016 I have added a comment in https://github.com/rapidsai/raft/pull/2707/files
Thanks @wolfram77 . I replied