SciMLOperators.jl
SciMLOperators.jl copied to clipboard
[later] kronsum
Da \osum Db = (Da \otimes Ib) + (Ia \otimes Db)
With current TensorProductOperator implementation, https://github.com/SciML/SciMLOperators.jl/blob/c9488881763bf838898f212f3aa071e4e987bc53/src/sciml.jl#L684-L724
reference computation: - a single component matvec.
using LinearAlgebra, BenchmarkTools
u = rand(12, 12 * 100)
v = rand(12, 12 * 100)
A = rand(12, 12)
@btime mul!($v, $A, $u); # 23.833 μs (0 allocations: 0 bytes)
Id = SciMLOperators.IdentityOperator{12}()
@btime mul!($v, $Id, $u); # 4.822 μs (0 allocations: 0 bytes)
We want to be as close to 24+5 = 30\mu s
for both I \otimes D
and D \otimes I
. Once we make TensorProducts efficient, we can figure out how to write a separate TensorSum wrapper later.
Other component operations:
permutedims!
using LinearAlgebra, BenchmarkTools
u = rand(12, 12, 100)
v = copy(u)
perm = (2, 1, 3)
@btime permutedims!($v, $u, $perm) # 8.980 μs (0 allocations: 0 bytes)
mul!
with identity - doing good over here
using SciMLOperators, LinearAlgebra, BenchmarkTools
u = rand(12, 12* 100)
v = copy(u)
Id = SciMLOperators.IdentityOperator{12}()
@btime mul!($v, $Id, $u); # 4.790 μs (0 allocations: 0 bytes)
@btime mul!($v, I, $u); # 11.729 μs (0 allocations: 0 bytes)
I \otimes D
- this is pretty good. nice and close to reference computation.
using SciMLOperators, LinearAlgebra, BenchmarkTools
u = rand(12^2, 100)
v = rand(12^2, 100)
A = SciMLOperators.IdentityOperator{12}() # outer
B = rand(12, 12) # inner
T = TensorProductOperator(A, B)
T = cache_operator(T, u)
@btime mul!($v, $T, $u); # 28.846 μs (1 allocation: 32 bytes)
D \otimes I
- this is not so hot. Let's also try to this vector-wise
using SciMLOperators, LinearAlgebra, BenchmarkTools
u = rand(12^2, 100)
v = rand(12^2, 100)
A = rand(12, 12) # outer
B = SciMLOperators.IdentityOperator{12}() # inner
T = TensorProductOperator(A, B)
T = cache_operator(T, u)
@btime mul!($v, $T, $u); # 63.360 μs (7 allocations: 304 bytes)
# this is not too hot, let's try doing it vector-wise.
u = [rand(12*12) for i=1:100]
v = [rand(12*12) for i=1:100]
T = cache_operator(T, u[1])
@btime for i=1:length($u) mul!($v[i], $T, $u[i]) end; # 228.426 μs (200 allocations: 2.00 MiB)
# this is worse :/
Batchwise component matvec
julia> using SciMLOperators, LinearAlgebra, BenchmarkTools; A = rand(12,12); u = [rand(12) for i=1:12*100]; v = copy(u); @btime for i=1:length($u); mul!($v[i], $A, $u[i]); end;
104.498 μs (0 allocations: 0 bytes) # let's just not do batchwise
I added short circuits in TensorProductOperator
mul!
implementations for when we have IdentityOperators in tensor products.
https://github.com/SciML/SciMLOperators.jl/pull/59/commits/2eaede76a51b5cf96601e206020f004fc114002d