graph-benchmarks icon indicating copy to clipboard operation
graph-benchmarks copied to clipboard

Use pagerank_scipy instead of pagerank [networkx]?

Open MridulS opened this issue 4 years ago • 2 comments

In terms of raw performance networkx.pagerank_scipy can be 4-5X faster than networkx.pagerank. For the google.txt file on my local machine.

In [4]: %%timeit
   ...: nx.pagerank(G, alpha=0.85, tol=1e-3, max_iter=10000000)
   ...:
40.1 s ± 5.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: %%timeit
   ...: nx.pagerank_scipy(G, alpha=0.85, tol=1e-3, max_iter=10000000)
   ...:
8.89 s ± 48.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)```

MridulS avatar Aug 27 '20 21:08 MridulS

Thanks for sharing your results. There's a pagerank_numpy as well. For this benchmark I wanted to access it against a pure python implementation baseline.

timlrx avatar Sep 09 '20 13:09 timlrx

Page rank calculations do not make a very good comparison between packages because with the naïve algorithm, i.e. iterating a matrix product, performance will depend on the targeted accuracy. How many iterations do you do before the result is "stable enough"?

You can try this out by setting the tol parameter in NetworkX's implementation.

Some packages use more advanced algorithms that produce exact results and do not require a tolerance.

szhorvat avatar Nov 15 '23 12:11 szhorvat