Field G. Van Zee
Field G. Van Zee
I'll take a look at it.
Uhh, am I missing something, @devinamatthews? ``` $ ls ref_kernels/3 bli_gemm_ref.c bli_gemmsup_ref.c bli_gemmtrsm_ref.c bli_trsm_ref.c old ``` It's not clear what I need to fix.
> The bb kernels live in some strangebplace. Oh wait, didn't we fold them into the conventional reference microkernels?
@devinamatthews Ah, I think I see the problem now. It's not that the `gemmtrsm` ukernels lack edge-case handling; it's that they haven't been updated according to the latest way that...
@devinamatthews Ah, yes, I had previously overlooked this: ```c const inc_t rs_b = packnr; \ const inc_t cs_b = bli_cntx_get_blksz_def_dt( dt, BLIS_BBN, cntx ); \ ```
Not sure if this is relevant, but I just realized this morning that *unless* @devinamatthews's recent changes (vis-a-vis merging the bb `packm` functionality into a single reference kernel) extends bb...
Yes, that's right. `hemm`/`symm` also require their own macros (because the Hermitian and symmetric packing formats also would presumably not support explicit broadcast). And `trmm3` (for same reasons as `trmm`).
Also just noticed that `power10`'s subconfig registers preferences for complex `gemm` ukernels without registering any actual complex ukernels. :thinking: ```c void bli_cntx_init_power10( cntx_t* cntx ) { blksz_t blkszs[ BLIS_NUM_BLKSZS ];...
Another observation: `power10` uses row-preferential ukernels (which use broadcast values of A, not B) but has broadcast-B values defined in `config/power10/bli_kernel_defs_power10.h`. :thinking: It's a wonder any of this `power10` code...
Okay, I'm going to consult with @nicholaiTukanov, the original author of the `power10` subconfig and kernel set, to make sure I understand how things *should* be registered before I begin...