warp
warp copied to clipboard
[DOCS] Add caveats for the adjoint of C in wp.matmul()
Category
- [ ] Report an error in the documentation.
- [x] Request for something to be documented.
- [ ] Suggestion to improve the documentation.
- [ ] Other (please explain)
Description
Document the differentiability nuances of wp.tile_matmul, in particular the bias term.
Because we replace assign C in
C = alpha A * B + beta * C
the adjoint of C must be handled with care in gradient calculations. The result will only be correct if C is passed to linear functions.
In nonlinear graphs, matrix multiplication may be re-written using other builtins, but it will be slower.