Md Asghar Ahmad Shahid

Results 2 issues of Md Asghar Ahmad Shahid

Implements the lowering of vector contraction op to vector outerproduct wrapped inside an scf.forloop. with iterargs to accumulate the result of each outerproduct corresponding to the K dimension size. The...

Fp32 brgemm can be lowered using FMAs but this can not be used for BF16 inputs. Intel AMX has TMUL functional unit which provides tile registers of size 16x16 for...

benchmark-full