Support of in-place (mutating) LinearMaps
I'm opening this issue to reflect strong interest in this planned functionality. It is crucial e.g. to build an efficient shift-and-invert scheme on top of KrylovKit that can compete with Arpack (which is far more fragile than KrylovKit in my experience, and of course not pure Julia).
Currently, when passing a function to, say, eigsolve, it is assumed that the function f(v) implements some linear map A * v, allocating a new vector for each call. The low hanging fruit here is to allow f!(w, v) with a preallocated vector w, much like LinearMaps does this. It would actually be nice to integrate KrylovKit with LinearMaps.
Thanks for opening this issue. Part of my motivation for KrylovKit was to go beyond the hassle of having to define a linear map first, but more so because I wanted to use more general vectors in eigenvalue problems and linear problems than (Abstract)Vector.
While I agree that supporting in-place operations is probably the way to go, I was kind of hoping that garbage collection in recent julia became more efficient such that the overhead of memory allocation would be reasonable. Do you have a clear indication that this is not the case and gc time is still a significant or limiting factor?
Well, I just compared a shift-and-invert interior eigenvalue computation with ArnoldiMethods (which supports mutating LinearMaps) and with KrylovKit (without mutation), and found a factor ~~10~~ 5 slower runtime with the latter. I was assuming the reason was the repeated allocations (a factor ~~200x~~ 5 more), but there might be other reasons (such as the generality of your vector-like objects).
The @time macro in Julia Base should give you a hint as to total time spent in garbage collection. There will certainly be other factors contributing to this time difference as well.
I see rather small GC activity,
julia> x0 = complex.(rand(size(h,1))); @benchmark 1 ./ eigsolve(x -> l * x, $x0, 10)[1]
BenchmarkTools.Trial:
memory estimate: 959.08 KiB
allocs estimate: 3983
--------------
minimum time: 4.152 ms (0.00% GC)
median time: 4.431 ms (0.00% GC)
mean time: 4.614 ms (3.07% GC)
maximum time: 56.379 ms (91.96% GC)
--------------
samples: 1083
evals/sample: 1
For comparison, the ArnoldiMethod equivalent (cannot @btime it because it complains about zero pivot vectors or somesuch). The difference is about a factor 5, not 10, but note the allocations
julia> @time 1 ./ partialschur(l, nev = 10)[1].eigenvalues
0.008024 seconds (930 allocations: 323.156 KiB)
Another suggestion might be to enable multithreading. Due to the way KrylovKit works, it won't collect the different vectors that span the Krylov space into a matrix (because they might not be plain vectors). As a result, BLAS level 2 operations cannot be used for manipulating the vectors (inner products and linear combinations).
In principle I could add a special code path for when they are. But for now, what is in KrylovKit, is my own multithreaded routines that are similar to BLAS level 2. So maybe so see a speed boost by starting Julia with export JULIA_NUM_THREADS=4 or however many cores you have available.