triton icon indicating copy to clipboard operation
triton copied to clipboard

[Feature Request] Add FP8 GEMM to Ada (SM89)

Open DD-DuDa opened this issue 1 year ago • 6 comments

Hello, it's my understanding that Triton currently does not support the use of FP8 tensor cores with Ada Lovelace GPUs (SM_89), correct? I've noticed that Cublas has implemented support for it, and TransformerEngine also provides FP8 support for Ada.

I am eagerly hoping for prompt support so that we can utilize FP8 on the 4090 as well. 🥺

I would greatly appreciate your help !!

DD-DuDa avatar Jan 30 '24 12:01 DD-DuDa

Does https://github.com/NVIDIA/TransformerEngine use cuBLAS?

Jokeren avatar Jan 30 '24 13:01 Jokeren

Does https://github.com/NVIDIA/TransformerEngine use cuBLAS?

Yes.

https://github.com/NVIDIA/TransformerEngine/blob/b5e13a16611be162538f489f3fd7096518640e15/transformer_engine/common/gemm/cublaslt_gemm.cu#L41

DD-DuDa avatar Jan 31 '24 11:01 DD-DuDa

Are you aware of any ptx instructions that support fp8 on sm89?

Jokeren avatar Jan 31 '24 15:01 Jokeren

Are you aware of any ptx instructions that support fp8 on sm89?

CUDA 12.4 or newer may support it now.

see: https://github.com/NVIDIA/cutlass/blob/c4e3e122e266644c61b4af33d0cc09f4c391a64b/include/cutlass/arch/mma_sm89.h#L57

DD-DuDa avatar Mar 26 '24 02:03 DD-DuDa

Yeah, I think so

Jokeren avatar Mar 26 '24 13:03 Jokeren

PyTorch 2.3 has support for FP8 gemm on Ada Lovelace that we use in vLLM, it would be great to have this supported in triton. PyTorch PR: https://github.com/pytorch/pytorch/pull/118881

mgoin avatar May 02 '24 16:05 mgoin

@DD-DuDa I tested this on my 4090 and it works now, with CUDA 12.4, PTX 8.4 and Triton nightly. However, there are a few things that still don't work:

  • upcasting FP8 to BF16 (although you can go FP8 -> FP32 -> BF16, I have a patch that will do this in PTX so it works seamlessly)
  • casting FP8 in RTNE mode
  • tl.dot() with FP8 and IEEE precision (vs. the usual TF32 precision)
  • of course, all of the Hopper features like fences, WGMMA instructions, etc

rationalism avatar Jun 11 '24 16:06 rationalism

Thank you! @rationalism

DD-DuDa avatar Jun 11 '24 22:06 DD-DuDa