SciMLOperators.jl icon indicating copy to clipboard operation
SciMLOperators.jl copied to clipboard

[later] kronsum

Open vpuri3 opened this issue 2 years ago • 2 comments

Da \osum Db = (Da \otimes Ib) + (Ia \otimes Db)

vpuri3 avatar Jun 09 '22 16:06 vpuri3

With current TensorProductOperator implementation, https://github.com/SciML/SciMLOperators.jl/blob/c9488881763bf838898f212f3aa071e4e987bc53/src/sciml.jl#L684-L724

reference computation: - a single component matvec.

using LinearAlgebra, BenchmarkTools

u = rand(12, 12 * 100)
v = rand(12, 12 * 100)

A = rand(12, 12)

@btime mul!($v, $A, $u); # 23.833 μs (0 allocations: 0 bytes)

Id = SciMLOperators.IdentityOperator{12}()
@btime mul!($v, $Id, $u); # 4.822 μs (0 allocations: 0 bytes)

We want to be as close to 24+5 = 30\mu s for both I \otimes D and D \otimes I. Once we make TensorProducts efficient, we can figure out how to write a separate TensorSum wrapper later.

Other component operations:

permutedims!

using LinearAlgebra, BenchmarkTools
u = rand(12, 12, 100)
v = copy(u)
perm = (2, 1, 3)
@btime permutedims!($v, $u, $perm) # 8.980 μs (0 allocations: 0 bytes)

mul! with identity - doing good over here

using SciMLOperators, LinearAlgebra, BenchmarkTools
u = rand(12, 12* 100)
v = copy(u)
Id = SciMLOperators.IdentityOperator{12}()
@btime mul!($v, $Id, $u); #  4.790 μs (0 allocations: 0 bytes)
@btime mul!($v, I, $u); # 11.729 μs (0 allocations: 0 bytes)

I \otimes D - this is pretty good. nice and close to reference computation.

using SciMLOperators, LinearAlgebra, BenchmarkTools

u = rand(12^2, 100)
v = rand(12^2, 100)

A = SciMLOperators.IdentityOperator{12}() # outer
B = rand(12, 12) # inner

T = TensorProductOperator(A, B)
T = cache_operator(T, u)

@btime mul!($v, $T, $u); # 28.846 μs (1 allocation: 32 bytes)

D \otimes I - this is not so hot. Let's also try to this vector-wise

using SciMLOperators, LinearAlgebra, BenchmarkTools

u = rand(12^2, 100)
v = rand(12^2, 100)

A = rand(12, 12) # outer
B = SciMLOperators.IdentityOperator{12}() # inner

T = TensorProductOperator(A, B)
T = cache_operator(T, u)

@btime mul!($v, $T, $u); # 63.360 μs (7 allocations: 304 bytes)

# this is not too hot, let's try doing it vector-wise. 
u = [rand(12*12) for i=1:100]
v = [rand(12*12) for i=1:100]

T = cache_operator(T, u[1])

@btime for i=1:length($u) mul!($v[i], $T, $u[i]) end; # 228.426 μs (200 allocations: 2.00 MiB)

# this is worse :/

Batchwise component matvec

julia> using SciMLOperators, LinearAlgebra, BenchmarkTools; A = rand(12,12); u = [rand(12) for i=1:12*100]; v = copy(u); @btime for i=1:length($u); mul!($v[i], $A, $u[i]); end;
  104.498 μs (0 allocations: 0 bytes) # let's just not do batchwise

vpuri3 avatar Jun 18 '22 14:06 vpuri3

I added short circuits in TensorProductOperator mul! implementations for when we have IdentityOperators in tensor products.

https://github.com/SciML/SciMLOperators.jl/pull/59/commits/2eaede76a51b5cf96601e206020f004fc114002d

vpuri3 avatar Jun 18 '22 14:06 vpuri3