Bhaskar Nallani

Results 6 comments of Bhaskar Nallani

Currently MR,NR are same between gemmsup or gemm for a given data type. Yes! It is better to change to l3_sup api mentioned below. const dim_t NR = bli_cntx_get_l3_sup_blksz_def_dt( dt,...

Yes! if you consider just your gemm case. But when we have sup for gemm, trsm , gemmt then it is difficult to use same l3_sup for all api's if...

Looks like issue is with the vector load access in fringe case kernels where the load is trying to access out of bound. I am able to reproduce the memory...

Hi @fgvanzee , Here is the fix -vfmadd231ps(mem(rcx, 0*32), xmm3, xmm4) +vmovsd(mem(rcx), xmm5) +vfmadd231ps(xmm5, xmm3, xmm4) -vfmadd231ps(mem(rcx, 0*32), xmm3, xmm6) +vmovsd(mem(rcx), xmm5) +vfmadd231ps(xmm5, xmm3, xmm6) -vfmadd231ps(mem(rcx, 0*32), xmm3, xmm8) +vmovsd(mem(rcx),...

But that function is already disabled under #if 0

gemm, trsm, gemmt are improved on zen2/3 and will be releasing as part of upcoming AMD-BLIS Release.