KrylovKit.jl icon indicating copy to clipboard operation
KrylovKit.jl copied to clipboard

Schur Factorisation Reordering fails

Open nrontsis opened this issue 6 years ago • 14 comments

When using Arnoldi on some (badly scaled) problems, trexc! throws LAPACKException(1) with the following stacktrace

 [1] trexc!(::Int64, ::Int64, ::SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}, ::SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}) at /Users/nrontsis/KrylovKit.jl/src/dense/linalg.jl:415
 [2] permuteschur!(::SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}, ::SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}, ::Array{Int64,1}) at /Users/nrontsis/KrylovKit.jl/src/dense/linalg.jl:335
 [3] _schursolve(::Function, ::Array{Float64,1}, ::Int64, ::Symbol, ::Arnoldi{ModifiedGramSchmidt2,Float64}) at /Users/nrontsis/KrylovKit.jl/src/eigsolve/arnoldi.jl:184
 [4] eigsolve(::Function, ::Array{Float64,1}, ::Int64, ::Symbol, ::Arnoldi{ModifiedGramSchmidt2,Float64}) at /Users/nrontsis/KrylovKit.jl/src/eigsolve/arnoldi.jl:108
 [5] #eigsolve#41 at /Users/nrontsis/KrylovKit.jl/src/eigsolve/eigsolve.jl:163 [inlined]
 [6] #eigsolve at ./none:0 [inlined]
 [7] #eigsolve#40 at /Users/nrontsis/KrylovKit.jl/src/eigsolve/eigsolve.jl:146 [inlined]
 [8] #eigsolve at ./none:0 [inlined] (repeats 2 times)

My understanding is that trexc! reorders a Schur factorization. However, the Q matrix passed to trexc! is not orthogonal. Is this something expected?

Julia version: 1.0.1 KrylovKit.jl version: latest master (28d74fd)

nrontsis avatar Dec 23 '18 20:12 nrontsis

Q comes directly out of a Schur factorization from Lapack (hseqr!) so that is indeed very strange. Do you have a minimal working example?

Jutho avatar Dec 23 '18 20:12 Jutho

The problem appears inside a bigger iterative method of mine. While trying to create a minimal example, I noticed that calling again eigsolve on the same matrix with the same x0 usually works. Is there any randomness (apart from x0) embedded eigsolve that might be causing this?

nrontsis avatar Dec 23 '18 20:12 nrontsis

There shouldn't be. Is your matrix an actual matrix (i.e. `AbstractMatrix') or is it a function. Does the function have any side effects? Does it not mutate the incoming vector?

Jutho avatar Dec 23 '18 21:12 Jutho

Okay, thank you. I will check again my code and come back to you.

nrontsis avatar Dec 23 '18 21:12 nrontsis

Minimal example: Download the attached JLD2 and run:

using KrylovKit
using JLD2

@load "error.jld2" A x0
λ, V, info = eigsolve(A, x0, 2, :LR)

nrontsis avatar Dec 24 '18 12:12 nrontsis

That's strange:

julia> λ, V, info = eigsolve(A, x0, 2, :LR)
(Complex{Float64}[0.153975+0.0im, 3.20255e-8+0.0im, 1.86884e-8+0.0im], Array{Complex{Float64},1}[[9.53135e-16+0.0im, -1.30201e-15+0.0im, 1.27389e-16+0.0im, -1.45716e-16+0.0im, -2.09105e-16+0.0im, -4.76706e-16+0.0im, -6.8875e-16+0.0im, -4.49207e-16+0.0im, 6.70572e-18+0.0im, -2.50869e-16+0.0im  …  -0.0327677+0.0im, -0.00120309+0.0im, -0.00388521+0.0im, 0.0133307+0.0im, 0.00403483+0.0im, -0.0139143+0.0im, 0.00414714+0.0im, 0.0265978+0.0im, -0.00467477+0.0im, -0.00206761+0.0im], [2.13434e-10+0.0im, -7.58207e-10+0.0im, -5.78059e-10+0.0im, 2.07219e-10+0.0im, -1.92211e-10+0.0im, 1.14499e-9+0.0im, 6.92057e-10+0.0im, 5.31202e-10+0.0im, -1.76579e-10+0.0im, 1.02903e-9+0.0im  …  -0.0240968+0.0im, 0.0199752+0.0im, -0.000910951+0.0im, 0.017899+0.0im, -0.0290804+0.0im, 0.0665985+0.0im, -0.0286177+0.0im, -0.00142595+0.0im, -0.0825563+0.0im, 0.077213+0.0im], [-5.30656e-11+0.0im, -2.69949e-10+0.0im, -6.87135e-11+0.0im, -2.19887e-10+0.0im, -1.13489e-10+0.0im, 1.54622e-10+0.0im, 3.25096e-11+0.0im, 6.26286e-12+0.0im, -1.38628e-11+0.0im, 1.38762e-10+0.0im  …  -0.016705+0.0im, 0.0254119+0.0im, 0.00103357+0.0im, 0.0625205+0.0im, -0.00921046+0.0im, 0.0618178+0.0im, -0.00796061+0.0im, 0.00116415+0.0im, -0.0530138+0.0im, 0.0800853+0.0im]], ConvergenceInfo: 3 converged values after 5 iterations and 78 applications of the linear map;
norms of residuals are given by (4.527248713753779e-60, 5.1762911875411585e-15, 1.1308865585393554e-13).
)

So seems to work for me. Could you also post:

julia> versioninfo()
Julia Version 1.0.2
Commit d789231e99 (2018-11-08 20:11 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

and

julia> import LinearAlgebra; LinearAlgebra.BLAS.vendor()
:openblas64

Jutho avatar Dec 24 '18 13:12 Jutho

julia> versioninfo()
Julia Version 1.0.1
Commit 0d713926f8 (2018-09-29 19:05 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)
Environment:
  JULIA_BINDIR = /Applications/Julia-1.0.app/Contents/Resources/julia/bin
  JULIA_DIR = /Applications/Julia-1.0.app/Contents/Resources/julia
  JULIA_PARDISO = /Users/nrontsis/anaconda/pkgs/mkl-2018.0.2-1/lib
julia> import LinearAlgebra; LinearAlgebra.BLAS.vendor()
:openblas64

Perhaps also try

for i = 1:100
    λ, V, info = eigsolve(A, x0, 2, :LR)
end

and

for i = 1:100
    λ, V, info = eigsolve(A, randn(size(A,1)), 2, :LR)
end

(I guess if it's a problem of finite precision then it might not always happen).

nrontsis avatar Dec 24 '18 13:12 nrontsis

Ok, the random initial vector triggers it indeed. The error is not surprising:

*          = 1:  two adjacent blocks were too close to swap (the problem
*                is very ill-conditioned); T may have been partially
*                reordered, and ILST points to the first row of the
*                current position of the block being moved.

In principle, I am doing too much by really sorting the schur values in the order of importance, i.e. if you select :LR you can be sure that real(λ[1]) >= real(λ[2]) >= .... This is not so with ARPACK if I remember correctly. However, I do find this useful. Maybe there is a more robust way, but I will have to think about it.

Jutho avatar Dec 24 '18 14:12 Jutho

Perhaps I am missing something, but what is preventing you from doing this sorting at the very end?

nrontsis avatar Dec 24 '18 19:12 nrontsis

@Jutho ARPACK does sort the eigenvalues in post-processing.

But it's true that you are doing a bit 'too much' work by sorting the Ritz values -- you merely have to partition them in three groups (locked, retained and removed Ritz values), that's what I'm doing here https://github.com/haampie/ArnoldiMethod.jl/blob/master/src/run.jl#L293-L356

So if two pairs of eigenvalues are nearly identical, then it doesn't matter if the swap fails if both of them belong to the same group.

haampie avatar Dec 25 '18 11:12 haampie

@nrontsis , yes of course I should do that at the very end. Thanks for the pointers, @haampie

How urgent is this prolbem @nrontsis ; I will definitely fix it but I currently have some other priorities.

Jutho avatar Dec 26 '18 21:12 Jutho

Thanks. It is not very urgent. As a temporal solution, I use a try/catch statement that calls again eigsolve when it fails. This seems to work for now.

nrontsis avatar Dec 29 '18 08:12 nrontsis

I know this has been a very long time, but I've looked again at this recently. I tried changing the strategy and indeed only select a number of eigenvalues. However, this does not solve the problem. The matrix in the failing example is of size 450 x 450, and has one eigenvalue with a significant positive real part (the one you are probably interested in), 7 eigenvalues with a significant negative real part, and all other 442 eigenvalues seem to be extremely close to zero (order 10^-8, both real and imaginary part). So whatever strategy I am using to select a number of eigenvalues, I will never be able to group all of those eigenvalues which are very close together, and thus the problem can keep occuring.

I tried to see how @haampie 's ArnoldiMethod.jl goes around this, but as he has its own implementation of all the Lapack stuff such as Schur reordering, and it seems these never can throw any errors unlike LAPACK.

Jutho avatar Mar 16 '22 16:03 Jutho

Thanks for coming back to this!

For what it's worth, I was interested in the two rightmost eigevalues.

The eigenproblem in question arose from solving a Trust Region Suproblem. It can be shown that the second rightmost eigenpair corresponds to a local minimum for that subproblem or a certificate of absence of the said local minimiser (see Theorem 2 in this paper).

From the practical point of view, I think that one can't really expect good performance from an Krylov eigensolver in finding more than 1 rightmost eigenvalue in this matrix, so I think this problem is only useful as a "hard eigenproblem example".

nrontsis avatar Mar 16 '22 23:03 nrontsis