cugraph
cugraph copied to clipboard
[FEA]: nx-cugraph should intercept built-in constructors like `from_pandas_edgelist` if `NETWORKX_BACKEND_PRIORITY=cugraph`
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
When we use nx-cugraph, we currently need to create the NetworkX graph on the CPU regardless of whether every algorithm we intend to use is supported by the cuGraph backend. As a result, we pay a non-trivial performance penalty converting between CPU and GPU graphs.
The new caching mechanism configurable via CACHE_CONVERTED_GRAPH=True was designed to address this problem, making it possible to only pay this cost once per graph if you're going to run multiple algorithms.
But it would be great to avoid this cost in the first place by dispatching on the graph construction operators in addition to the algorithms. In the example below, we spend significant time in from_pandas_edgelist and _convert_graph (the latter of which is only a one-time cost if we use caching).
If I've already committed to using the cuGraph backend as the top priority backend, I'd ideally just create the graph on the GPU and only pay the CPU/GPU conversion cost if I need to fallback to the CPU.
# !wget https://data.rapids.ai/cugraph/datasets/cit-Patents.csv
%env NETWORKX_BACKEND_PRIORITY=cugraph
import pandas as pd
import networkx as nx
df = pd.read_csv("cit-Patents.csv", sep=" ", names=["src", "dst"], dtype="int32")
%%snakeviz
G = nx.from_pandas_edgelist(df.head(1000000), source="src", target="dst")
pr = nx.pagerank(G, alpha=0.9)
But cuGraph supports from_pandas_edgelist and it's much faster (100ms vs 8s in this case):
%timeit -n3 -r3 G_gpu = cugraph.from_pandas_edgelist(df.head(1000000), source="src", destination="dst")
71.2 ms ± 11.8 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)
Describe your ideal solution
The following code should dispatch to the cuGraph backend for from_pandas_edgelist in addition to pagerank.
# !wget https://data.rapids.ai/cugraph/datasets/cit-Patents.csv
%env NETWORKX_BACKEND_PRIORITY=cugraph
import pandas as pd
import networkx as nx
df = pd.read_csv("cit-Patents.csv", sep=" ", names=["src", "dst"], dtype="int32")
G = nx.from_pandas_edgelist(df.head(1000000), source="src", target="dst")
pr = nx.pagerank(G, alpha=0.9)
Describe any alternatives you have considered
No response
Additional context
No response
Code of Conduct
- [X] I agree to follow cuGraph's Code of Conduct
- [X] I have searched the open feature requests and have found no duplicates for this feature request