scikit-matter icon indicating copy to clipboard operation
scikit-matter copied to clipboard

Rank-one updates and other potential performance gains for CUR

Open agoscinski opened this issue 11 months ago • 1 comments

This is a revive of the draft PR https://github.com/scikit-learn-contrib/scikit-matter/pull/86 (please look into it for further information) because I think it is worth to look into this more given that CUR outperforms FPS by far in regression quality and is often not used because it is so expensive to compute.

The core idea is to update the eigenvectors after a selection instead of recomputing them by an eigendecomposition. @ceriottm mentioned in a discussion that it was mathematically unstable for eigenvectors corresponding to degenerated eigenvalues. So this deserves some dedicated time look into this in detail.

Links:

  • please look at the links of the closed PR draft
  • Math explaining the core idea https://math.stackexchange.com/a/3625609
  • LAPACK function that might be required https://www.netlib.org/lapack/explore-html/d2/d24/group__aux_o_t_h_e_rcomputational_ga3c4a943599132aea3ac964c08392853a.html
  • I am not aware of any python bindings of LAPACK function but there exists a similar function in scipy https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lapack.dlasd4.html

agoscinski avatar Aug 24 '23 17:08 agoscinski

Ok I thought a bit more and I think I recall better what the issue was.

  1. these are some algorithms for rank-1 updates http://doi.org/10.1137/S089547989223924X and http://doi.org/10.1007/BF01396012 - you can see it's kind of a pain to get the eigenvectors
  2. the issue here is that for a big matrix we don't want to compute ALL eigenvectors, but only the top ones. this can be done efficiently, but the problem is then how to do a rank-one update of only some of the eigenvectors. this is the only article I had found for this, but couldn't find an implementation https://epubs.siam.org/doi/epdf/10.1137/18M1172120

ceriottm avatar Aug 25 '23 06:08 ceriottm