triton icon indicating copy to clipboard operation
triton copied to clipboard

[WIP] Support small dots and optimization of dot operands

Open binarman opened this issue 7 months ago • 2 comments

This PR

  • Introduces several fixes in FMA dot implementation
  • Enables support of small dots with MNK dimensions down to 1
  • Introduces dot operand optimization for dots with small M(<=8), large N and K dimensions
  • Introduces generation of v_dot2/v_dot4 instructions for AMD backend

These changes are needed to reduce granularity loss on dots with small M or N dimension, like (1x256)x(256x64)

binarman avatar Jul 26 '24 17:07 binarman