juice Fix Linear layer bias gradient computation; add size checks to CUDA functions

Fix Linear layer bias gradient computation; add size checks to CUDA functions

Open hweom opened this issue 2 years ago • 2 comments

Closes #169

Add an assert to CUDA copy() function to check that source and destination have the same size.
Add an assert to CUDA gemm() function to check that computed matrix multiplication dimensions match the passed tensor sizes.
Fix the Linear layer bias gradient computation by summing all gradients in the batch instead of copying the output gradient to the bias gradient (which is incorrect since the output gradient contains N items, not 1).

After this change, cuda-memcheck no longer finds errors.

All unit tests continue to pass except ui which is also broken at HEAD.

Jul 28 '22 00:07 hweom

Test fixes are in #172.

Aug 11 '22 02:08 hweom

Could you rebase this PR onto master? I'll merge right away then

Aug 11 '22 07:08 drahnr

Done.

Aug 14 '22 01:08 hweom