cugraph icon indicating copy to clipboard operation
cugraph copied to clipboard

nx-cugraph: add `ZeroGraph` for nx-compatibility (zero-code change)

Open eriknw opened this issue 1 year ago • 1 comments

This is an alternative approach to #4558 for enabling GPU-accelerated NetworkX to "just work". It has similarities to #4558. I opted to make separate classes such as ZeroGraph, which I think makes for cleaner separation and gives us and users more control.

There are a few lingering TODOs and code comments to tidy up, but I don't think there are any show-stoppers. I have not updated methods (such as number_of_nodes) to optimistically try to use GPU if possible, b/c this is not strictly necessary, but we should update these soon.

I have run NetworkX tests with these classes using https://github.com/networkx/networkx/pull/7585 and https://github.com/networkx/networkx/pull/7600. We need the behavior in 7585, and 7600 is only useful for testing.

There are only 3 new failing tests, and 3 tests that hang (I'll run them overnight to see if they finish). Here's a test summary:

5548 passed, 24 skipped, 16 xfailed, 25 xpassed

Note that 25 tests that were failing now pass. I have not investigated test failures, xfails, or xpasses yet. I would like to add tests too.

We rely heavily on the networkx cache. I think this is preferred.

It is late for me. I will describe and show how and why this works later.

I opted for zero= and ZeroGraph, because I find them delightful! Renaming is trivial if other terms are preferred.

CC @quasiben

eriknw avatar Aug 22 '24 21:08 eriknw

Some examples when using https://github.com/networkx/networkx/pull/7585

In [1]: import networkx as nx
In [2]: import nx_cugraph as nxcg
In [3]: import pandas as pd
In [4]: import logging

In [5]: nxl = logging.getLogger("networkx")
In [6]: nxl.addHandler(logging.StreamHandler())
In [7]: nxl.setLevel(logging.DEBUG)

In [8]: # Configure graph generators to use cugraph.
In [9]: # Can also be set via NETWORKX_BACKEND_PRIORITY_GENERATORS environment variable
In [10]: nx.config.backend_priority.generators = ["cugraph"]

In [11]: # DataFrame with int and string dtypes for edge data
In [12]: df = pd.DataFrame({"source": ['a', 'b'], "target": ['b', 'c'], "x": [1, 2], "y": ["a", "b"]})

In [13]: # Only use int edge data; is on GPU
In [14]: G = nx.from_pandas_edgelist(df, edge_attr=["x"])
Call to `from_pandas_edgelist' has inputs from no backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `from_pandas_edgelist' with arguments: (df=  source target  x  y
0      a      b  1  a
1      b      c  2  b, source='source', target='target', edge_attr=['x'], create_using=None, edge_key=None)

In [15]: G._is_on_gpu
Out[15]: True

In [16]: G._is_on_cpu
Out[16]: False

In [17]: # Use both int and str edge data; is on CPU
In [18]: G = nx.from_pandas_edgelist(df, edge_attr=["x", "y"])
Call to `from_pandas_edgelist' has inputs from no backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `from_pandas_edgelist' with arguments: (df=  source target  x  y
0      a      b  1  a
1      b      c  2  b, source='source', target='target', edge_attr=['x', 'y'], create_using=None, edge_key=None)
Backend 'cugraph' raised NotImplementedError when calling `from_pandas_edgelist'
Trying next backend: 'networkx'
Call to `empty_graph' has inputs from no backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `empty_graph' with arguments: (n=0, create_using=None, default=<class 'networkx.classes.graph.Graph'>)

In [19]: G._is_on_gpu
Out[19]: False

In [20]: G._is_on_cpu
Out[20]: True

In [21]: # This algorithm only uses int edge data; use GPU
In [22]: %time nx.pagerank(G, weight="x")
Call to `pagerank' has inputs from {'cugraph'} backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `pagerank' with arguments: (G=<nx_cugraph.classes.zero.ZeroGraph object at 0x7f471355b650>, alpha=0.85, personalization=None, max_iter=100, tol=1e-06, nstart=None, weight='x', dangling=None)
CPU times: user 55.3 ms, sys: 36.2 ms, total: 91.5 ms
Wall time: 121 ms
Out[22]: {'a': 0.18783806264400482, 'b': 0.48648586869239807, 'c': 0.3256761133670807}

In [23]: # The previous call cached to the GPU, so subsequent calls using this data are faster
In [24]: %time nx.pagerank(G, weight="x")
Call to `pagerank' has inputs from {'cugraph'} backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `pagerank' with arguments: (G=<nx_cugraph.classes.zero.ZeroGraph object at 0x7f471355b650>, alpha=0.85, personalization=None, max_iter=100, tol=1e-06, nstart=None, weight='x', dangling=None)
CPU times: user 10.4 ms, sys: 0 ns, total: 10.4 ms
Wall time: 10.3 ms
Out[24]: {'a': 0.18783806264400482, 'b': 0.48648586869239807, 'c': 0.3256761133670807}

In [25]: # Creating an empty graph also returns an nx-cugraph graph
In [26]: G = nx.empty_graph()
Call to `empty_graph' has inputs from no backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `empty_graph' with arguments: (n=0, create_using=None, default=<class 'networkx.classes.graph.Graph'>)

In [27]: # Mutation always forces us to the CPU (for now)
In [28]: G.add_edge(0, 1, x=1)

In [29]: G._is_on_gpu
Out[29]: False

In [30]: G._is_on_cpu
Out[30]: True

In [31]: # But we can get the GPU graph
In [32]: G._cugraph  # Suggestions for a better name?
Out[32]: <nx_cugraph.classes.graph.Graph at 0x7f471462c790>

In [33]: G._is_on_gpu
Out[33]: True

In [34]: G._is_on_cpu
Out[34]: True

# This graph can be used (with no conversion) by normal networkx algorithms that nx-cugraph does not implement
In [35]: nx.harmonic_centrality(G)
Call to `harmonic_centrality' has inputs from {'cugraph'} backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Backend 'cugraph' does not implement `harmonic_centrality'
Trying next backend: 'networkx'
Converting input graphs from 'cugraph' backend to 'networkx' backend for call to `harmonic_centrality'
Caching converted graph (from 'cugraph' to 'networkx' backend) in call to `harmonic_centrality' for 'G' argument
Call to `shortest_path_length' has inputs from {'cugraph'} backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `shortest_path_length' with arguments: (G=<nx_cugraph.classes.zero.ZeroGraph object at 0x7f4713ebc890>, source=0, target=None, weight=None, method='dijkstra')
Call to `shortest_path_length' has inputs from {'cugraph'} backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `shortest_path_length' with arguments: (G=<nx_cugraph.classes.zero.ZeroGraph object at 0x7f4713ebc890>, source=1, target=None, weight=None, method='dijkstra')
Out[35]: {0: 1.0, 1: 1.0}

In [36]: # We can also use string or any Python object as edge data
In [37]: G.add_edge(0, 2, y="hi")
In [38]: G.add_edge(1, 2, z=["x", object()])

# Let's compare performance of using `G.add_edge` in a tight loop
In [39]: %%timeit
    ...: G = nx.empty_graph(backend="networkx")
    ...: for i in range(1_000_000):
    ...:     G.add_edge(i, i+1)
    ...:
1.13 s ± 6.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [40]: %%timeit
    ...: G = nx.empty_graph(backend="cugraph")
    ...: for i in range(1_000_000):
    ...:     G.add_edge(i, i+1)
    ...:
1.67 s ± 6.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Some replies:

  • We can change names and choose whatever UX we think is best; the approach in this PR is flexible. zero was helpful to implement and will be easy to do find and replace
  • This will not break if the user disables cache, but we will still use the cache
  • Functions that mutate the graph are not yet generally supported. Expect a solution for this release, but not by Tuesday

In addition to refining the final UX, this has a couple rough edges to smooth out and we should do more thorough testing (and optimizing), but as the examples above show it is extremely capable today.

I have gone through a couple iterations to try to get CI to pass.

eriknw avatar Aug 25 '24 21:08 eriknw

Can you also update the description and title of this PR since the implementation is quite different now?

rlratzel avatar Sep 17 '24 13:09 rlratzel

/merge

rlratzel avatar Sep 24 '24 15:09 rlratzel