blis icon indicating copy to clipboard operation
blis copied to clipboard

GEMMSUP Handler Referring GEMM Kernel Size?

Open xrq-phys opened this issue 4 years ago • 4 comments

Yet another issue. Excuse me. 🤣

Just happened to notice that (obj_t).blkszs sometimes affects kernel selection of gemmsup which is, I guess, supposed to be only dominated by (obj_t).l3_sup_blkszs?

These lines seems to be a plausible cause: https://github.com/flame/blis/tree/7c3eb44/frame/3/bli_l3_sup_int.c#L95-L96 https://github.com/flame/blis/tree/7c3eb44/frame/3/bli_l3_sup_int.c#L269-L270

It's usually not much impact upon performance even when we use gemmsup kernel sizes different from gemm, but it could be a bit confusing when doing some debug works.

xrq-phys avatar Jun 03 '21 09:06 xrq-phys

Currently MR,NR are same between gemmsup or gemm for a given data type.

Yes! It is better to change to l3_sup api mentioned below.

const dim_t NR = bli_cntx_get_l3_sup_blksz_def_dt( dt, BLIS_NR, cntx ); const dim_t MR = bli_cntx_get_l3_sup_blksz_def_dt( dt, BLIS_MR, cntx );

At the same time it is required to create new tables and set/get functions for gemmsup, trsmsup, gemmtsup and similarly for other l3 api's so that all are independent.

BhaskarNallani avatar Jul 08 '21 04:07 BhaskarNallani

Seem already implemented: https://github.com/flame/blis/tree/fab5c86/frame/base/bli_cntx.h#L336

and used: https://github.com/flame/blis/tree/fab5c86/frame/3/bli_l3_sup_var12.c#L235-L239 https://github.com/flame/blis/tree/fab5c86/frame/3/bli_l3_sup_var1n2m.c#L292-L296 https://github.com/flame/blis/tree/fab5c86/frame/3/old/bli_l3_sup_var1n2m.c#L276-L280

Maybe it would be just OK to replace the lines in bli_l3_sup_int.c with the code you presented?

xrq-phys avatar Jul 14 '21 09:07 xrq-phys

Yes! if you consider just your gemm case.

But when we have sup for gemm, trsm , gemmt then it is difficult to use same l3_sup for all api's if they need different sizes, where we need a separate api's bli_cntx_get_l3_supgemm_blksz_def_dt() , bli_cntx_get_l3_suptrsm_blksz_def_dt() to be independent of each other.

We can also solve this having family id (gemm/trsm/gemmt ...etc) as an argument to the bli_cntx_get_l3_sup_blksz_def_dt to differentiate.

BhaskarNallani avatar Jul 14 '21 09:07 BhaskarNallani

It would be very nice to allow using different sizes for GEMM, GEMMT and TRMM, while currently the (non-sup) block kernel handler seems to require all kernel sizes to be equal? (There is only one bli_cntx_get_blksz_def_dt.)

Of course I would also imagine SUP cases to be much more complicated thus requiring a different way to handle...

xrq-phys avatar Jul 14 '21 09:07 xrq-phys