Bhaskar Nallani
Bhaskar Nallani
Currently MR,NR are same between gemmsup or gemm for a given data type. Yes! It is better to change to l3_sup api mentioned below. const dim_t NR = bli_cntx_get_l3_sup_blksz_def_dt( dt,...
Yes! if you consider just your gemm case. But when we have sup for gemm, trsm , gemmt then it is difficult to use same l3_sup for all api's if...
Looks like issue is with the vector load access in fringe case kernels where the load is trying to access out of bound. I am able to reproduce the memory...
Hi @fgvanzee , Here is the fix -vfmadd231ps(mem(rcx, 0*32), xmm3, xmm4) +vmovsd(mem(rcx), xmm5) +vfmadd231ps(xmm5, xmm3, xmm4) -vfmadd231ps(mem(rcx, 0*32), xmm3, xmm6) +vmovsd(mem(rcx), xmm5) +vfmadd231ps(xmm5, xmm3, xmm6) -vfmadd231ps(mem(rcx, 0*32), xmm3, xmm8) +vmovsd(mem(rcx),...
But that function is already disabled under #if 0
gemm, trsm, gemmt are improved on zen2/3 and will be releasing as part of upcoming AMD-BLIS Release.