tpp-mlir
tpp-mlir copied to clipboard
Support lowering of vector.contract to amx for brgemm
Fp32 brgemm can be lowered using FMAs but this can not be used for
BF16 inputs.
Intel AMX has TMUL functional unit which provides tile registers
of size 16x16 for bf16 data type and corresponding load, store,
multiply instructions. This pass lowers the tiled brgemm from
vector dialect to AMX dialect which subsequently gets lowered to
AMX instructions.