juice
juice copied to clipboard
Use pointwise multiply instead of blas gemm for bias
Currently a gemm
operation is used to calculate the bias, which is a O(n^3)
operation, where it should be a simply sum per SharedTensor
element