graph-benchmarks
graph-benchmarks copied to clipboard
Use pagerank_scipy instead of pagerank [networkx]?
In terms of raw performance networkx.pagerank_scipy
can be 4-5X faster than networkx.pagerank
. For the google.txt
file on my local machine.
In [4]: %%timeit
...: nx.pagerank(G, alpha=0.85, tol=1e-3, max_iter=10000000)
...:
40.1 s ± 5.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [5]: %%timeit
...: nx.pagerank_scipy(G, alpha=0.85, tol=1e-3, max_iter=10000000)
...:
8.89 s ± 48.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)```
Thanks for sharing your results. There's a pagerank_numpy as well. For this benchmark I wanted to access it against a pure python implementation baseline.
Page rank calculations do not make a very good comparison between packages because with the naïve algorithm, i.e. iterating a matrix product, performance will depend on the targeted accuracy. How many iterations do you do before the result is "stable enough"?
You can try this out by setting the tol
parameter in NetworkX's implementation.
Some packages use more advanced algorithms that produce exact results and do not require a tolerance.