pygsp Speed and scalability improvements for graph multiresolution

trafficstars

The motivation for these modification is to improve the scalability of the code to be able to handle much larger graphs.

With the baseline code, I couldn't compute graph pyramids for graphs with about 40,000 nodes, the peak memory consumption went above 500GB ! Turns out this was due to 2 main problems:

In the kron_reduction function, the computation of the Schur complement was actually turning back the sparse input matrices into dense matrices. Plus, the computation of the Schur complement can be sped up by using a linear equation solver optimized to handle SDD matrices (symmetric diagonally dominant), which is the case of graph laplacians. There are even more efficient algorithms out there (starting with Cholesky decomposition) such as Koutis et al. (2011) arXiv:1102.4842 but I didn't find an out of the box python implementation and SuperLU seemed to do the job for me. I ended up rewriting bits of kron_reduction to ensure that the sparse matrices were not turned into dense matrices when not necessary, and I added pygsp.utils.splu_inv_dot as an optimized alternative to the use of scipy.sparse.spsolve .
In the graph_sparsify function, the computation of the resistance distances was done using blunt matrix inversion in pygsp.utils.resistance_distance which tries to invert it a potentially very large Laplacian matrix. After a kron reduction, this laplacian can have a lot of non zero elements, that's what was causing the worse of the memory consumption in my case. However, it turns out that the whole point of the Spielman-Srivastava spectral sparsification algorithm is to avoid having to compute this inverse, the algorithm only requires approximate distances and provides a procedure to compute them. I implemented pygsp.utils.approx_resistance_distance to compute these distances based on the algorithm described in arxiv:0803.0929 . It still requires a way of solving a linear inverse system, where I used again my customized pygsp.utils.splu_inv_dot but it should be noted that there are much faster ways of doing this. Including the near linear time algorithm described in arXiv:1209.5821v3

With these modifications, I can now compute graph multiresolutions in practice for graphs with about 50,000 nodes and 1e6 edges without running into memory issues on a standard desktop. The slowest point in the process is the sampling time for the edges in graph_sparsify, scipy.stats.rv_discrete takes about 1s to sample 10,000 deviates, in my case the algorithm needed 16e6 which takes about 30min just to sample random numbers whereas the rest of the algorithm only takes minutes, so it's really stupid, but at least it eventually works.

I tried to comment and document these modifications in the hope that they might be useful to others, everything seems to work for me but someone should take a close look to check that it doesn't break anything.

Jun 04 '17 19:06 EiffL

Hi, thanks for the contribution. Could you check and correct the errors that were raised in the TravisCI checks? They seem to be mostly errors in the code examples in the documentation, raising errors such as "NameError: name 'sparse' is not defined", "NameError: name 'extract_submatrix' is not defined", or "NameError: name 'block' is not defined".

Jun 15 '17 13:06 rodrigo-pena

There's still a silly error in the doctest (it's one that I often forget is a problem):

pygsp/pygsp/utils.py", line 261, in utils.py Expected: (8, 8) """ M = M.tocoo() Got: (8, 8)

There should be a newline after (8,8) in the documentation

Jun 15 '17 20:06 rodrigo-pena

My bad, it should be fine now.

Jun 15 '17 21:06 EiffL

Thanks for the contribution. :) Could you look at @rodrigo-pena's comments so that we can merge it ?

Aug 10 '17 13:08 mdeff

Coverage increased (+1.7%) to 81.844% when pulling bf2183e29c17e28bbdd2595d629e3a445340649b on EiffL:faster into ef8823a410697d587227e5d2eaf376f903a577d8 on epfl-lts2:master.

Apr 20 '18 01:04 coveralls

pygsp pygsp copied to clipboard

Speed and scalability improvements for graph multiresolution

pygsp
pygsp copied to clipboard