networkit icon indicating copy to clipboard operation
networkit copied to clipboard

Add option to add edges from a scipy.sparse.coo_matrix and numpy cython support

Open Relux-the-Relux opened this issue 3 years ago • 3 comments

Adding the option to read edges from coo_arrays by using the Cython NumPy API which should be significantly faster than just using the regular Python API.

There seems to be a problem in the CI where it doesn't find numpy for setup.py

Relux-the-Relux avatar Jul 06 '22 15:07 Relux-the-Relux

It would be good to some simple benchmarks to check if the performance increased substantially.

Also, it would be nice to support COO arrays directly (in case the user does not use Scipy). That is almost no extra work.

avdgrinten avatar Jul 12 '22 14:07 avdgrinten

Also, it would be nice to support COO arrays directly (in case the user does not use Scipy). That is almost no extra work.

scipy is currently a requirement to install NetworKit, so I don't think users won't have scipy. But I can do it if that is still desirable.

Relux-the-Relux avatar Jul 12 '22 14:07 Relux-the-Relux

Well, it's just a matter of adding a function that access 3 numpy arrays instead of one scimy matrix. Once that is added, the scipy-based function can call this new one.

avdgrinten avatar Jul 13 '22 09:07 avdgrinten

Some speedup measurements:

G = nk.generators.BarabasiAlbertGenerator(10, 100000).generate()
row = np.zeros(G.numberOfEdges(), dtype=np.uint)
col = np.zeros(G.numberOfEdges(), dtype=np.uint)
data = np.zeros(G.numberOfEdges(), dtype=np.double)

i = 0
for u,v in G.iterEdges():
    row[i] = u
    col[i] = v
    data[i] = random.uniform(1.0, 100.0)

n = G.numberOfNodes()

S = sc.sparse.coo_matrix((data, (row, col)), dtype = np.double)
start = time.time()
for i in range(100):
    nk.GraphFromCoo(S, n)
end = time.time()

print("Use Cython+Numpy interface with matrix (calling C++ addEdge directly): ", end-start)

start = time.time()
for i in range(100):
    nk.GraphFromCoo(S)
end = time.time()

print("Use Cython+Numpy interface with matrix (calling Python addEdge): ", end-start)

start = time.time()
for i in range(100):
    nk.GraphFromCoo((data,(row,col)), n)
end = time.time()

print("Use Cython+Numpy interface (calling C++ addEdge directly): ", end-start)

start = time.time()
for i in range(100):
    nk.GraphFromCoo((data,(row,col)))
end = time.time()

print("Use Cython+Numpy interface (calling Python addEdge): ", end-start)

start = time.time()
for i in range(100):
    G2 = nk.Graph(n, weighted = True)
    for j in range(len(row)):
        G2.addEdge(row[j], col[j], data[j])
end = time.time()

print("Use Python: ", end-start)

Results (the time for creating and add edges on 100x graphs with 100k nodes each):

Use Cython+Numpy interface with matrix (calling C++ addEdge directly):  0.6958749294281006
Use Cython+Numpy interface with matrix (calling Python addEdge):  6.129072904586792
Use Cython+Numpy interface (calling C++ addEdge directly):  0.5570602416992188
Use Cython+Numpy interface (calling Python addEdge):  6.020819902420044
Use Python:  25.36305522918701

fabratu avatar Aug 07 '23 14:08 fabratu