Md Asghar Ahmad Shahid
Results
2
issues of
Md Asghar Ahmad Shahid
Implements the lowering of vector contraction op to vector outerproduct wrapped inside an scf.forloop. with iterargs to accumulate the result of each outerproduct corresponding to the K dimension size. The...
Fp32 brgemm can be lowered using FMAs but this can not be used for BF16 inputs. Intel AMX has TMUL functional unit which provides tile registers of size 16x16 for...
benchmark-full