how-to-optimize-gemm
how-to-optimize-gemm copied to clipboard
Add missing loop over block columns of C
The block column width of C is defined by #define nb 1000 and used to allocate the buffers, but otherwise nb is unused.
As a result, if the m-by-n matrix C has n > nb, the code encounters a segmentation fault. This commit fixes this issue by
- adding the missing loop over the block columns of C, and
- adjusting the default leading dimensions to whatever the row count is.