triton
triton copied to clipboard
[AMD] Support FP8E5M2 with MFMA FP16 instructions
Cast dot arguments from unsupported FP8 to supported FP16 in order to use MFMA instructions instead of FMA. This approach is expected to give better performance and be more stable compared to FMA implementation.