KrylovKit.jl Support of in-place (mutating) LinearMaps

I'm opening this issue to reflect strong interest in this planned functionality. It is crucial e.g. to build an efficient shift-and-invert scheme on top of KrylovKit that can compete with Arpack (which is far more fragile than KrylovKit in my experience, and of course not pure Julia).

Currently, when passing a function to, say, eigsolve, it is assumed that the function f(v) implements some linear map A * v, allocating a new vector for each call. The low hanging fruit here is to allow f!(w, v) with a preallocated vector w, much like LinearMaps does this. It would actually be nice to integrate KrylovKit with LinearMaps.

May 28 '19 09:05 pablosanjose

Thanks for opening this issue. Part of my motivation for KrylovKit was to go beyond the hassle of having to define a linear map first, but more so because I wanted to use more general vectors in eigenvalue problems and linear problems than (Abstract)Vector.

While I agree that supporting in-place operations is probably the way to go, I was kind of hoping that garbage collection in recent julia became more efficient such that the overhead of memory allocation would be reasonable. Do you have a clear indication that this is not the case and gc time is still a significant or limiting factor?

May 28 '19 09:05 Jutho

Well, I just compared a shift-and-invert interior eigenvalue computation with ArnoldiMethods (which supports mutating LinearMaps) and with KrylovKit (without mutation), and found a factor ~~10~~ 5 slower runtime with the latter. I was assuming the reason was the repeated allocations (a factor ~~200x~~ 5 more), but there might be other reasons (such as the generality of your vector-like objects).

May 28 '19 09:05 pablosanjose

The @time macro in Julia Base should give you a hint as to total time spent in garbage collection. There will certainly be other factors contributing to this time difference as well.

May 28 '19 09:05 Jutho

I see rather small GC activity,

julia> x0 = complex.(rand(size(h,1))); @benchmark 1 ./ eigsolve(x -> l * x, $x0, 10)[1]
BenchmarkTools.Trial: 
  memory estimate:  959.08 KiB
  allocs estimate:  3983
  --------------
  minimum time:     4.152 ms (0.00% GC)
  median time:      4.431 ms (0.00% GC)
  mean time:        4.614 ms (3.07% GC)
  maximum time:     56.379 ms (91.96% GC)
  --------------
  samples:          1083
  evals/sample:     1

For comparison, the ArnoldiMethod equivalent (cannot @btime it because it complains about zero pivot vectors or somesuch). The difference is about a factor 5, not 10, but note the allocations

julia> @time 1 ./ partialschur(l, nev = 10)[1].eigenvalues
  0.008024 seconds (930 allocations: 323.156 KiB)

May 28 '19 09:05 pablosanjose

Another suggestion might be to enable multithreading. Due to the way KrylovKit works, it won't collect the different vectors that span the Krylov space into a matrix (because they might not be plain vectors). As a result, BLAS level 2 operations cannot be used for manipulating the vectors (inner products and linear combinations).

In principle I could add a special code path for when they are. But for now, what is in KrylovKit, is my own multithreaded routines that are similar to BLAS level 2. So maybe so see a speed boost by starting Julia with export JULIA_NUM_THREADS=4 or however many cores you have available.

Jun 14 '19 09:06 Jutho