cugraph icon indicating copy to clipboard operation
cugraph copied to clipboard

Updates to make `nx_cugraph.Graph` a drop-in replacement for `nx.Graph`, adds attrs for auto-dispatch for generators

Open rlratzel opened this issue 6 months ago • 0 comments

TODO:

  • Unit tests
  • Improve graph update methods (add_node(), et.al.)
  • Update remaining graph classes

This updates nx-cugraph Graph and DiGraph classes to inherit from nx.Graph, and adds the appropriate cached_properties to lazily convert and cache to a NetworkX Graph and expose the appropriate dictionaries accordingly. These changes allow a nx_cugraph.Graph instance to be drop-in compatible with networkx functions that are not yet supported by nx_cugraph.

Combine this with the changes to NetworkX in this PR to auto dispatch generators if they return compatible backend types and allow compatible backend types to fallback to networkx, and users can maximize e2e acceleration for their workflows without code changes.

edgelist_csv = "/datasets/cugraph/csv/directed/cit-Patents.csv"
edgelist_df = pd.read_csv(edgelist_csv, sep=" ", names=["src", "dst"], dtype="int32")

with Timer("from_pandas_edgelist"):
    G = nx.from_pandas_edgelist(
        edgelist_df, source="src", target="dst", create_using=nx.DiGraph)

print(type(G))

with Timer("number of nodes and edges"):
    print(f"{G.number_of_nodes()=}, {G.number_of_edges()=}")

with Timer("pagerank"):
    pr = nx.pagerank(G)

with Timer("coloring"):
    c1 = nx.coloring.greedy_color(G)

with Timer("coloring (again)"):
    c2 = nx.coloring.greedy_color(G)

with Timer("adding a node"):
    G.add_edge(0, (3.14159, "string_in_tuple"))

print(type(G))
print(f"{G.number_of_nodes()=}, {G.number_of_edges()=}")

with Timer("re-running pagerank"):
    pr2 = nx.pagerank(G)

print(f"new vs. orig nodes: {pr2.keys() - pr.keys()}")

with Timer("pad_graph (this mutates the input graph)"):
    cc = nx.coloring.equitable_coloring.pad_graph(G, 11)

print(type(G))
print(f"{G.number_of_nodes()=}, {G.number_of_edges()=}")

with Timer("re-running pagerank"):
    pr3 = nx.pagerank(G)

print(f"new vs. orig nodes: {pr3.keys() - pr.keys()}")

Timer.print_total()

No backends used:

(nx) root@8546eec3d49d:~# python zcc_demo.py

from_pandas_edgelist...
Done in: 0:00:50.219987
<class 'networkx.classes.digraph.DiGraph'>

number of nodes and edges...
G.number_of_nodes()=3774768, G.number_of_edges()=16518948
Done in: 0:00:01.851362

pagerank...
Done in: 0:01:10.388206

coloring...
Done in: 0:00:13.802888

coloring (again)...
Done in: 0:00:13.793485

adding a node...
Done in: 0:00:00.000018
<class 'networkx.classes.digraph.DiGraph'>
G.number_of_nodes()=3774769, G.number_of_edges()=16518949

re-running pagerank...
Done in: 0:01:03.532062
new vs. orig nodes: {(3.14159, 'string_in_tuple')}

pad_graph (this mutates the input graph)...
Done in: 0:00:00.000764
<class 'networkx.classes.digraph.DiGraph'>
G.number_of_nodes()=3774771, G.number_of_edges()=16518950

re-running pagerank...
Done in: 0:01:16.790938
new vs. orig nodes: {(3.14159, 'string_in_tuple'), 3774769, 3774770}
Total time: 0:04:50.379710

nx-cugraph backend used - nx-cugraph does not yet support coloring.greedy_color() or nx.coloring.equitable_coloring.pad_graph(), note the first call to coloring includes the conversion to a networkx Graph, but the second uses the cached conversion:

(nx) root@8546eec3d49d:~# NETWORKX_BACKEND_PRIORITY=cugraph python zcc_demo.py

from_pandas_edgelist...
Done in: 0:00:00.664462
<class 'nx_cugraph.classes.digraph.DiGraph'>

number of nodes and edges...
G.number_of_nodes()=3774768, G.number_of_edges()=16518948
Done in: 0:00:00.000008

pagerank...
Done in: 0:00:03.741143

coloring...
Done in: 0:01:11.706015

coloring (again)...
Done in: 0:00:11.752219

adding a node...
Done in: 0:00:13.415563
<class 'nx_cugraph.classes.digraph.DiGraph'>
G.number_of_nodes()=3774769, G.number_of_edges()=16518949

re-running pagerank...
Done in: 0:00:00.878451
new vs. orig nodes: {(3.14159, 'string_in_tuple')}

pad_graph (this mutates the input graph)...
Done in: 0:00:13.069187
<class 'nx_cugraph.classes.digraph.DiGraph'>
G.number_of_nodes()=3774771, G.number_of_edges()=16518950

re-running pagerank...
Done in: 0:00:00.896314
new vs. orig nodes: {3774769, 3774770, (3.14159, 'string_in_tuple')}
Total time: 0:01:56.123361

Also note, when debug logging is enabled, you can see calls made from within networkx functions being dispatched appropriately:

pad_graph (this mutates the input graph)...
DEBUG:networkx.utils.backends:no backends are available to handle the call to `pad_graph` with graph types {'cugraph'}
DEBUG:networkx.utils.backends:falling back to backend 'networkx' for call to `pad_graph' with args: (<nx_cugraph.classes.digraph.DiGraph object at 0x7efb84138d60>, 11), kwargs: {}
DEBUG:networkx.utils.backends:using backend 'cugraph' for call to `complete_graph' with args: (2, None), kwargs: {}
DEBUG:networkx.utils.backends:using backend 'cugraph' for call to `relabel_nodes' with args: (<nx_cugraph.classes.graph.Graph object at 0x7efb84139c60>, {0: 3774769, 1: 3774770}, True), kwargs: {}
Done in: 0:00:13.226258

zcc_demo.py.txt

rlratzel avatar Jul 27 '24 09:07 rlratzel