CUDA.jl icon indicating copy to clipboard operation
CUDA.jl copied to clipboard

Added support for kron

Open JATippit opened this issue 4 years ago • 8 comments

Currently using kron with CuArrays returns a warning about scalar indexing. Added a kernel for kron with accompanying higher level function.

JATippit avatar Apr 07 '21 01:04 JATippit

Codecov Report

Merging #814 (694039f) into master (4e81f3d) will decrease coverage by 0.38%. The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #814      +/-   ##
==========================================
- Coverage   78.03%   77.64%   -0.39%     
==========================================
  Files         120      121       +1     
  Lines        7383     7420      +37     
==========================================
  Hits         5761     5761              
- Misses       1622     1659      +37     
Impacted Files Coverage Δ
src/CUDA.jl 100.00% <ø> (ø)
src/kronecker.jl 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4e81f3d...694039f. Read the comment docs.

codecov[bot] avatar Apr 07 '21 02:04 codecov[bot]

Is this quicker than just reshaping and broadcasting, with something like this?

function kron2d(A, B)
    A4 = reshape(A, 1, size(A,1), 1, size(A,2))
    B4 = reshape(B, size(B,1), 1, size(B,2), 1)
    C4 = A4 .* B4
    C = reshape(C4, size(A,1) * size(B,1), size(A,2) * size(B,2))
end

A = [1 2 3 4; 5 6 7 8]; B =  [1 10; 100 1000];
kron(A,B) == kron2d(A,B)

mcabbott avatar Apr 08 '21 17:04 mcabbott

Is this quicker than just reshaping and broadcasting, with something like this?

function kron2d(A, B)
    A4 = reshape(A, 1, size(A,1), 1, size(A,2))
    B4 = reshape(B, size(B,1), 1, size(B,2), 1)
    C4 = A4 .* B4
    C = reshape(C4, size(A,1) * size(B,1), size(A,2) * size(B,2))
end

A = [1 2 3 4; 5 6 7 8]; B =  [1 10; 100 1000];
kron(A,B) == kron2d(A,B)

Your function actually executes faster on my machine.

JATippit avatar Apr 12 '21 03:04 JATippit

Your function actually executes faster on my machine.

A custom kernel seems unnecessary then?

maleadt avatar Apr 12 '21 06:04 maleadt

Your function actually executes faster on my machine.

A custom kernel seems unnecessary then?

I agree - honestly didn't know reshape would work this way.

JATippit avatar Apr 12 '21 15:04 JATippit

Even though we don't need a custom kernel, we'd still need to provide kron with that implementation, so this didn't have to be closed.

maleadt avatar Apr 12 '21 15:04 maleadt

Would this be better added to a pre-existing file versus a standalone kronecker.jl then?

JATippit avatar Apr 12 '21 15:04 JATippit

I agree - honestly didn't know reshape would work this way.

You may be interested in TensorCast.jl, my package for taking this seriously.

It's possible that some broadcasting implementation should be in Base, with the present implementation (with explicit loops) limited to Array or something -- it still seems faster on small CPU arrays. Not a short-term solution, of course. I think it would also solve https://github.com/JuliaLang/julia/issues/40439.

mcabbott avatar Apr 12 '21 15:04 mcabbott