Add Two-sided Arnoldi method
I've added an implementation of the Two-sided Krylov-Schur Restarted Arnoldi method in Ref. [1, Alg. 2] to KrylovKit.jl. The implementation is not done yet and needs some improvements (e.g., unit tests and an update to the convergence criterion).
It is currently only tested for finding the smallest real eigenvalue :SR for dense Julia matrices. The interface is as close as possible to the other methods, i.e., for a dense matrix A calling
eigsolve(A, collect(adjoint(A)), v0, w0, 4, :SR, BiArnoldi(; krylovdim=20, verbosity=10, maxiter=100))
yields the 4 smallest real value left and right eigenvalues, vectors and convergence information.
Are you interested in merging this into KrylovKit.jl? If so, what should I additionally add (next to tests and the improved convergence criterion)?
I'm interested in using this method downstream (in ITensors.jl) and hence incorporating this into the package would be very much appreciated. I'm willing to do the extra work required, if it is not too much.
Thanks for your work!
[1] Krylov-Schur-Type Restarts for the Two-Sided Arnoldi Method, Ian N. Zwaan and Michiel E. Hochstenbach
Thanks for this; it seems like an impressive amount of work. I will have to look deeper into this, since I am on my way out of the office right now. Feel free to remind me if you haven't heard back in a few days.
A first comment: look at svdsolve and lssolve for the interface I use for methods that require both the linear map and its adjoint.
Related to this: since this algorithm can not have the exact same arguments and outputs as the other eigsolve routines, I would pack it under a different name, such as bieigsolve or eigsolve2. At some point, we were actually discussing having a new routine like this, when we believed that to implement the pullback of eigsolve, we would actually also need the corresponding left eigenvectors. However, since I then found a way to compute the pullback without needing the left eigenvectors, we halted the discussion of implementing such a method, however, I am still very supportive of having it.
Great to hear that this may find its way into the library! I've now updated the function signature to be closer to lssolve and renamed it to bieigsolve. Also, there are now tests and some further improvements to the convergence criterion etc.
However, in my tests (which I copied from the eigsolve tests), I get fails because MinimalVec does not support dot and normalize!, which I need to update the oblique projection. Is this something to care about? Some @test_logs fail as well.
Also, the return type of bieigsolve is currently still to clumsy, as it returns both left/right eigenvalues/vectors and convergence info. I'm interested in your suggestions on this. Additionally, the returned residuals are not the actual residuals. For the latter, I might need to apply the function again to the obtained eigenvectors, which I did not deem cost-efficient for this information. This still needs improvements as well.
dot should just be inner. normalize! could be scale!(v, 1/norm(v)); the problem is probably that normalize! from LinearAlgebra.jl calls rmul! or something, which is indeed not defined for MinimalVec.
Codecov Report
:x: Patch coverage is 87.54717% with 33 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 88.70%. Comparing base (054a6ae) to head (29f185f).
:warning: Report is 1 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #124 +/- ##
==========================================
- Coverage 88.90% 88.70% -0.20%
==========================================
Files 34 36 +2
Lines 3685 3930 +245
==========================================
+ Hits 3276 3486 +210
- Misses 409 444 +35
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@tipfom , are you ok with me pushing some changes directly to your branch?
@Jutho ~~Yes, I'll take care of it and also add docstrings :)~~
@Jutho, sorry, I misread your message. Of course you're very much welcome to push to this branch :)
Ok, I've simplified the BiArnoldi factorization and iteration to simply reuse everything from Arnoldi, as I think that was wat it amounted to. It's easier to maintain the code and make improvements in the future if there is less code duplication.
In bieigsolve, I've currently only made the changes required to make it work in combination with the changed factorization type. I will go through that function in more detail but this will have to wait until tomorrow or Wednesday. However, the code is very nicely structured and written, so I don't expect anything stalling this much longer. Hats off for a very nice PR!
The eigenvalues are now properly sorted and the eigenvalues properly biorthonormalized.
My apologies for the slow progress from my side; it's a busy week. I've left another question in the comments.
My apologies for the long silence, I was on holiday last week and didn't really get to continue on this. I will try to finish my review / changes from my side this week.
Looking at this once again, I am making some more changes locally and will push them asap. However, I have two more questions:
-
One of my comments regarding the residual computation was that I think for kappa_j you need not simply M[converged+1,converged+1] but rather
Z[:,converged+1]'*M*Q[:,converged+1]. -
Can you elucidate what the problem is with the eigenvalues of H and K not being matched up perfectly? Is this only about complex conjugate eigenvalue pairs in the case of using real arithmetic? Or is it more general? You know only fix this in the final step, where you compute the eigenvectors, but I would like for this to be true during the Schur process. I am not sure if it is strictly necessary, but I could foresee problems, where e.g. in deciding on the
keepvariable, it might be that both H and K have 2 x 2 blocks that misalign, so that no keep can be chosen that doesn't cut through blocks in either of those two matrices.
Actually, looking more in the computation of the convergence criterion, I don't understand why the authors want to use a relative precision where they divide by rho_j, being the eigenvalue. This doesn't make sense in my opinion. It can be perfectly valid to target eigenvalues that happen to be close to zero, as long as they are extremal. Furthermore, adding identity to an operator changes nothing to the Krylov subspace and thus to the method, but arbitrarily shifts eigenvalues and thus the meaning of relative errors. If you agree with this, I would rather opt for a convergence criterion that only uses absolute norms of residuals in comparison to tol.
- One of my comments regarding the residual computation was that I think for kappa_j you need not simply M[converged+1,converged+1] but rather
Z[:,converged+1]'*M*Q[:,converged+1].
Yes, that is correct. I'm however fine with dropping this prefactor alltogether as you suggested in your other comment. :)
- Can you elucidate what the problem is with the eigenvalues of H and K not being matched up perfectly? Is this only about complex conjugate eigenvalue pairs in the case of using real arithmetic? Or is it more general? You know only fix this in the final step, where you compute the eigenvectors, but I would like for this to be true during the Schur process. I am not sure if it is strictly necessary, but I could foresee problems, where e.g. in deciding on the
keepvariable, it might be that both H and K have 2 x 2 blocks that misalign, so that no keep can be chosen that doesn't cut through blocks in either of those two matrices.
I've only encountered this problem when working with (mathematically) real matrices, i.e., independent of the arithmetic type (complex or real), but those who have the property that both $\lambda$ and $\lambda^\star$ are eigenvalues. Then it may happen that when sorting for smallest real the eigenvalues of $H$ and $K$ are $[\ldots, \lambda, \lambda^\star, \ldots]$ and $[\ldots, \lambda^\star, \lambda, \ldots]$ respectively. To prevent this, I implented this sort-and-match algorithm. For the keep part, I thought that if we put the keep threshold in between the two, i.e., keep $\lambda$ in $H$ and $\lambda^\star$ in $K$, the two of them would technically be the same as unconverged eigenvectors.
Thats as much as I know. Sadly, I did not find any information on this in the paper. When I tried enforcing this explicit ordering in the Krylov decomposition by adjusting pK = sortperm(valuesK; by=by ∘ conj, rev=rev) to also match to pH, I encountered the cannot split 2x2 blocks when permuting schur decomposition when calling permuteschur! and therefore dropped this approach.
Ok thanks for the comments; that is very helpful. When working with real arithmetic, a valid which argument should anyway be such that it doesn't distinguish between $$\lambda$$ and $$\lambda^ast$$ (even though that is hard to check in full generality if it is provided as a general EigSorter.by. The complex conjugate pairs of eigenvalues are always returned by Lapack in fixed order (first the one with negative real part, then the one with positive real part). But indeed, since the eigenvalues of H and K are related by complex conjugation, it does mean that complex pairs of eigenvalues are actually swapped between H and K. But then it is in a fully deterministic manner, and we don't need complicated matching logic I believe.
Yes. Exactly due to this fact, the matching logic currently boils down to a linear loop which also normalizes the vectors. I'm also very open to simplifying it more :)
I've now gotten around to using this method some more and implemented non-Hermitian DMRG using it. In this context, I noticed that the concerns raised by @lkdvos about the precision used in isapprox is very important, i.e., for early sweeps the left/right eigenvalues may not match perfectly as is the case with the more constructed examples in the test suite. Currently, the method should at least (i) allow the required precision as a parameter and (ii) not return an error if the algorithm does not converge. Have you @Jutho implemented a different way to return the "correctly" matched eigenvalue pairs? Otherwise, I will think some more about it and commit a suggested fix.
I did some more digging and in SLEPc, they've also implemented the algorithm. However, they also need to run a match loop for the eigenvalues to be sorted properly when using the same LAPACK functionality, cf. here.
My apologies for the silence, I was unavailable all of last week. I will try to pick it up again this week and finally push this over the finish line.
Is there something else I can contribute to help get this finished @Jutho ? 😊
I tried to come up with a more elegant solution to the eigenvalue sorting; however, none of them really yielded satisfying results. There are some improvements to the current implementation, i.e., changing the comparison for Complex values to only compare the norm difference such that 1+1e-19im and 1+1e-16im would still compare be the same which they currently do not because isapprox is applied to both the real and the imaginary part. This is however only a minor improvement to the current implementation and does not fundamentally fix the issue.
My apologies for the silence here; I got a bit distracted by other code reviews. This week I am also not actively at work very much, but I will still try to find some time. If not, then definitely early next week I will make time for finishing this.
I've now updated the residual norm convention to be with rV and rW.
Ok I finally got around spending some more time on this PR, which was really needed for me to make progress. I have changed quite a few things, but in particular the way the final eigenvectors are computed. I think that in the way I compute them, the ordering and biorthogonality of the left and right eigenvectors is automatically guaranteed. Of course, it depends on inverting (part of) the overlap matrix M.
I also changed the structure of the return values of both bieigsolve and _bischursolve. As I have not yet updated the tests, this will definitely lead to failures. I would first like to hear your thoughts on the current implementation before I also update the tests. However, I think the hard work is now done and this PR should be able to land soon.
Hi @Jutho, I like this new approach with inverting M much better! This now also allows the algorithm to be used to get the current best estimate of both left and right eigenvector without full convergence, which is really neat. Thanks! I've pushed the changes which I needed to get it to run on my local machine.
The only major change is that at least for me working with conj(Q) and conj(Z) and h' and `k' yielded wrong results.
Can you maybe confirm that this happens for you as well, e.g., by running
using LinearAlgebra
using KrylovKit
using Random
function main()
Random.seed!(111111)
A = rand(ComplexF64, 1000, 1000)
x0 = rand(eltype(A), 1000)
fAr = x -> A * x
fAl = x -> Adjoint(A) * x
vals, (V, W), info = bieigsolve((fAr, fAl), x0, x0, 3, :SR, BiArnoldi(krylovdim=10, maxiter=1000, tol=1e-10))
@show eigvals(A)[1:5]
for i in eachindex(vals)
@show vals[i]
@show KrylovKit.inner(W[i], fAr(V[i]))
end
end
main()
@tipfom, the tests now run locally. I think the actual implementation is now complete, but the tests currently only cover the case with a single iteration of a small problem. Would you be willing to further complete the tests by adding a larger test case that is handled iteratively with more iterations, analogous to the other Arnoldi tests? I think that is the final thing that needs to happen in order to merge this PR.
@Jutho, I've now added the tests. Sorry for the long delay :)
Thanks, I'll take a look asap, such that this can be merged and a new KrylovKit version can be tagged. Was there a specific reason to remove the eager option?
Yes, I removed it because as of now the option was not working as intended, i.e., the unit tests failed with eager=true. However, I did not investigate this further and don't see an immediate reason why this would necessarily be the case. Is this something we aspire to include? If so, I could spend some time tomorrow to figure out if it is an algorithmic constraint or just some operation I missed to apply when using the eager option.
@tipfom , I finally made some time to look at this again, and would like to get this merged before tagging a new release of KrylovKit. However, currently, I get completely wrong results as soon as maxiter > 1, i.e. as soon as a restart takes place. Was this working before for you and did I manage to break something?
I did rebase this PR on the current master. Apparently I also had some uncommited changes about the role of the h and k vectors, so maybe this is what messed something up?
It seems to be the eager option; with eager=false everything works like it should. That's probably why you removed it earlier? However, it should be possible to make this work; just a matter of finding the bug.