ChainRules.jl icon indicating copy to clipboard operation
ChainRules.jl copied to clipboard

[WIP] rrule for eigen decomposition

Open ajozefiak opened this issue 5 years ago • 2 comments

This PR implements the rrule for eigen decomposition following the paper referenced in #144.

I have tested the implementation locally, but have yet to write tests for this PR as the implementation (as presented in the paper) is a bit unconventional to me and I think there will need to be modifications to my PR.

Several issues that I have encountered:

  • The adjoint of the eigenvalues is not used in the derivation of the adjoint of the eigen decomposition (by the authors). This is reflected in my implementation.
  • The paper assumes that the input matrix is symmetric positive semidefinite (i.e a covariance matrix).
  • In the paper, the forward pass relies on svd as the underlying implementation of the eigen decomposition (this is possible due to the assumption that the input is symmetric PSD). I assume calling eigen(X) for the forward pass dispatches on a sensible algorithm.
  • The proposed algorithm is to back-propagate through the power iteration, which requires choosing the number of iterations. This leads me to adding an additional parameter for rrule since eigen(X) is not iterative and does not allow a choice of iterations.

My local tests show that the proposed adjoint for eigen decomposition agrees with finite differencing a power iteration algorithm, but in many situations differs from finite differencing the eigen decomposition. This could be that I am incorrectly interpreting the algorithm outside the case where the adjoint of the eigenvectors dL/dV has at most one non-negative column.

I will update this PR as I think about the above issues but I am also opening this PR for feedback along the way.

ajozefiak avatar Apr 28 '20 20:04 ajozefiak

Can you clarify what you mean by?

The adjoint of the eigenvalues is not used in the derivation of the adjoint of the eigen decomposition

Are the eigenvalues not scalars? Why is surprising not to use the adjont of the eigenvalues ?


I am sorting out someone to check the math on this. This is cool.

oxinabox avatar Apr 29 '20 12:04 oxinabox

Does it make sense that it doesn't pull back the cotangent of the eigenvectors at all? Take a function that calls eigen to get the eigenvalues only and discards the eigenvectors. This method should pull back an sensitivity of 0 on A, but the usual version (e.g. see Zygote's at https://github.com/FluxML/Zygote.jl/blob/master/src/lib/array.jl#L570-L588) would pull back a nonzero sensitivity. I haven't studied the paper yet, but that looked funny.

Answering my own question, since by comparison of equations 8 and 10, they argue that as k → ∞, the PI gradients converge to the gradients of the eigendecomposition minus the term that comes from the cotangent of the eigenvalues (which they treat as zero), I think you could just add that back in to get the full cotangent on A. That term is not unstable in the degenerate case anyways.

The proposed algorithm is to back-propagate through the power iteration, which requires choosing the number of iterations. This leads me to adding an additional parameter for rrule since eigen(X) is not iterative and does not allow a choice of iterations.

Would it be better to provide a kmax and use a max error term to choose a k using their Eq 14?

sethaxen avatar May 17 '20 06:05 sethaxen