libxsmm
libxsmm copied to clipboard
LImit K unrolling in amx gemm kernel
trafficstars
The amx gemm kernels now fully unroll K. This results in code buffer size issues, thus for K >= 4096 we fallback to avx512 code gen. We can limit the unrolling of the K loop for large K values and still get amx code.
Addendum: It also seems that we have undefined behavior when running out of code buffer space. Sometimes we gracefully exit with NULL pointer return by code gen, sometime we just exit and crash.
Closed by PR #868