Currently using kron with CuArrays returns a warning about scalar indexing. Added a kernel for kron with accompanying higher level function.

Apr 07 '21 01:04 JATippit

Codecov Report

Merging #814 (694039f) into master (4e81f3d) will decrease coverage by 0.38%. The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master     #814      +/-   ##
==========================================
- Coverage   78.03%   77.64%   -0.39%     
==========================================
  Files         120      121       +1     
  Lines        7383     7420      +37     
==========================================
  Hits         5761     5761              
- Misses       1622     1659      +37

Impacted Files	Coverage Δ
src/CUDA.jl	`100.00% <ø> (ø)`
src/kronecker.jl	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4e81f3d...694039f. Read the comment docs.

Apr 07 '21 02:04 codecov[bot]

Is this quicker than just reshaping and broadcasting, with something like this?

function kron2d(A, B)
    A4 = reshape(A, 1, size(A,1), 1, size(A,2))
    B4 = reshape(B, size(B,1), 1, size(B,2), 1)
    C4 = A4 .* B4
    C = reshape(C4, size(A,1) * size(B,1), size(A,2) * size(B,2))
end

A = [1 2 3 4; 5 6 7 8]; B =  [1 10; 100 1000];
kron(A,B) == kron2d(A,B)

Apr 08 '21 17:04 mcabbott

Is this quicker than just reshaping and broadcasting, with something like this?

function kron2d(A, B)
    A4 = reshape(A, 1, size(A,1), 1, size(A,2))
    B4 = reshape(B, size(B,1), 1, size(B,2), 1)
    C4 = A4 .* B4
    C = reshape(C4, size(A,1) * size(B,1), size(A,2) * size(B,2))
end

A = [1 2 3 4; 5 6 7 8]; B =  [1 10; 100 1000];
kron(A,B) == kron2d(A,B)

Your function actually executes faster on my machine.

Apr 12 '21 03:04 JATippit

Your function actually executes faster on my machine.

A custom kernel seems unnecessary then?

Apr 12 '21 06:04 maleadt

Your function actually executes faster on my machine.

A custom kernel seems unnecessary then?

I agree - honestly didn't know reshape would work this way.

Apr 12 '21 15:04 JATippit

Even though we don't need a custom kernel, we'd still need to provide kron with that implementation, so this didn't have to be closed.

Apr 12 '21 15:04 maleadt

Would this be better added to a pre-existing file versus a standalone kronecker.jl then?

Apr 12 '21 15:04 JATippit

I agree - honestly didn't know reshape would work this way.

You may be interested in TensorCast.jl, my package for taking this seriously.

It's possible that some broadcasting implementation should be in Base, with the present implementation (with explicit loops) limited to Array or something -- it still seems faster on small CPU arrays. Not a short-term solution, of course. I think it would also solve https://github.com/JuliaLang/julia/issues/40439.

Apr 12 '21 15:04 mcabbott

Added support for kron

Codecov Report