juice
juice copied to clipboard
Fix Linear layer bias gradient computation; add size checks to CUDA functions
What does this PR accomplish?
- 🩹 Bug Fix
Closes #169
Changes proposed by this PR:
- Add an assert to CUDA
copy()
function to check that source and destination have the same size. - Add an assert to CUDA
gemm()
function to check that computed matrix multiplication dimensions match the passed tensor sizes. - Fix the Linear layer bias gradient computation by summing all gradients in the batch instead of copying the output gradient to the bias gradient (which is incorrect since the output gradient contains N items, not 1).
After this change, cuda-memcheck
no longer finds errors.
Notes to reviewer:
All unit tests continue to pass except ui
which is also broken at HEAD.
📜 Checklist
- [x] The
juice-examples
run just fine
Test fixes are in #172.
Could you rebase this PR onto master? I'll merge right away then
Done.