Vedant Puri
Vedant Puri
i think the 'slow' part is that we are applying the operator to column vectors in `u` one at a time https://github.com/SciML/SciMLOperators.jl/blob/746c0d50a75067fae4121547a3c00bd4bbe27ce9/src/sciml.jl#L608-L610
there's definitely something im doing wrong because we are slower than LinearMaps, and LinearMaps has allocations in their `mul!`. ```julia julia> using LinearMaps, LinearAlgebra, BenchmarkTools; A = ⊗(rand(12,12), rand(12,12), rand(12,12));...
@danielwe
`@turbo` didn't recognize `mul!` when i checked. and `L.outer` is not necessarily an `AbstractMatrix` (for eg it is an `AbstractSciMLOperator` we nest ops `TensorProductOperator(A, B, C) === TensorProductOperator(A, TPPOp(B,C))`) so...
@chriselrod im not sure how `view` without indexing would work. > https://github.com/SciML/SciMLOperators.jl/blob/746c0d50a75067fae4121547a3c00bd4bbe27ce9/src/sciml.jl#L608-L610 in this case, the slices `C[:,:,i], V[:,:,i]` are passed into `mul!` as subarrays. But `L.outer` isn't guaranteed to...
in https://github.com/SciML/SciMLOperators.jl/pull/59 i've written an implementation with using `permutedims!` that is faster than `LinearMaps.jl` and doesn't allocate. could you take a look and suggest improvements?
with current implementation, https://github.com/SciML/SciMLOperators.jl/blob/c9488881763bf838898f212f3aa071e4e987bc53/src/sciml.jl#L684-L724 ```julia using SciMLOperators, LinearAlgebra, BenchmarkTools u = rand(12^2, 100) v = rand(12^2, 100) A = rand(12, 12) # outer B = rand(12, 12) # inner T...
The two `permutedims!` calls together take about as much time as a single `mul!()`. So we should add more cases to the `L.outer` matvec computation to avoid the `permutedims!` wherever...
FYI currently we are at: ```julia using SciMLOperators, LinearAlgebra, BenchmarkTools A = TensorProductOperator(rand(12,12), rand(12,12), rand(12,12)) u = rand(12^3, 100) v = rand(12^3, 100) A = cache_operator(A, u) mul!(v, A, u)...
new idea: try using `PermutedDimsArray` instead of `permutedims`/ `permutedims!`