Add option to add edges from a scipy.sparse.coo_matrix and numpy cython support
Adding the option to read edges from coo_arrays by using the Cython NumPy API which should be significantly faster than just using the regular Python API.
There seems to be a problem in the CI where it doesn't find numpy for setup.py
It would be good to some simple benchmarks to check if the performance increased substantially.
Also, it would be nice to support COO arrays directly (in case the user does not use Scipy). That is almost no extra work.
Also, it would be nice to support COO arrays directly (in case the user does not use Scipy). That is almost no extra work.
scipy is currently a requirement to install NetworKit, so I don't think users won't have scipy. But I can do it if that is still desirable.
Well, it's just a matter of adding a function that access 3 numpy arrays instead of one scimy matrix. Once that is added, the scipy-based function can call this new one.
Some speedup measurements:
G = nk.generators.BarabasiAlbertGenerator(10, 100000).generate()
row = np.zeros(G.numberOfEdges(), dtype=np.uint)
col = np.zeros(G.numberOfEdges(), dtype=np.uint)
data = np.zeros(G.numberOfEdges(), dtype=np.double)
i = 0
for u,v in G.iterEdges():
row[i] = u
col[i] = v
data[i] = random.uniform(1.0, 100.0)
n = G.numberOfNodes()
S = sc.sparse.coo_matrix((data, (row, col)), dtype = np.double)
start = time.time()
for i in range(100):
nk.GraphFromCoo(S, n)
end = time.time()
print("Use Cython+Numpy interface with matrix (calling C++ addEdge directly): ", end-start)
start = time.time()
for i in range(100):
nk.GraphFromCoo(S)
end = time.time()
print("Use Cython+Numpy interface with matrix (calling Python addEdge): ", end-start)
start = time.time()
for i in range(100):
nk.GraphFromCoo((data,(row,col)), n)
end = time.time()
print("Use Cython+Numpy interface (calling C++ addEdge directly): ", end-start)
start = time.time()
for i in range(100):
nk.GraphFromCoo((data,(row,col)))
end = time.time()
print("Use Cython+Numpy interface (calling Python addEdge): ", end-start)
start = time.time()
for i in range(100):
G2 = nk.Graph(n, weighted = True)
for j in range(len(row)):
G2.addEdge(row[j], col[j], data[j])
end = time.time()
print("Use Python: ", end-start)
Results (the time for creating and add edges on 100x graphs with 100k nodes each):
Use Cython+Numpy interface with matrix (calling C++ addEdge directly): 0.6958749294281006
Use Cython+Numpy interface with matrix (calling Python addEdge): 6.129072904586792
Use Cython+Numpy interface (calling C++ addEdge directly): 0.5570602416992188
Use Cython+Numpy interface (calling Python addEdge): 6.020819902420044
Use Python: 25.36305522918701