triton [Feature Request] Add FP8 GEMM to Ada (SM89)

[Feature Request] Add FP8 GEMM to Ada (SM89)

Open DD-DuDa opened this issue 1 year ago • 6 comments

Hello, it's my understanding that Triton currently does not support the use of FP8 tensor cores with Ada Lovelace GPUs (SM_89), correct? I've noticed that Cublas has implemented support for it, and TransformerEngine also provides FP8 support for Ada.

I am eagerly hoping for prompt support so that we can utilize FP8 on the 4090 as well. 🥺

I would greatly appreciate your help !!

Jan 30 '24 12:01 DD-DuDa

Does https://github.com/NVIDIA/TransformerEngine use cuBLAS?

Jan 30 '24 13:01 Jokeren

Does https://github.com/NVIDIA/TransformerEngine use cuBLAS?

Yes.

https://github.com/NVIDIA/TransformerEngine/blob/b5e13a16611be162538f489f3fd7096518640e15/transformer_engine/common/gemm/cublaslt_gemm.cu#L41

Jan 31 '24 11:01 DD-DuDa

Are you aware of any ptx instructions that support fp8 on sm89?

Jan 31 '24 15:01 Jokeren

Are you aware of any ptx instructions that support fp8 on sm89?

CUDA 12.4 or newer may support it now.

see: https://github.com/NVIDIA/cutlass/blob/c4e3e122e266644c61b4af33d0cc09f4c391a64b/include/cutlass/arch/mma_sm89.h#L57

Mar 26 '24 02:03 DD-DuDa

Yeah, I think so

Mar 26 '24 13:03 Jokeren

PyTorch 2.3 has support for FP8 gemm on Ada Lovelace that we use in vLLM, it would be great to have this supported in triton. PyTorch PR: https://github.com/pytorch/pytorch/pull/118881

May 02 '24 16:05 mgoin

@DD-DuDa I tested this on my 4090 and it works now, with CUDA 12.4, PTX 8.4 and Triton nightly. However, there are a few things that still don't work:

upcasting FP8 to BF16 (although you can go FP8 -> FP32 -> BF16, I have a patch that will do this in PTX so it works seamlessly)
casting FP8 in RTNE mode
tl.dot() with FP8 and IEEE precision (vs. the usual TF32 precision)
of course, all of the Hopper features like fences, WGMMA instructions, etc

Jun 11 '24 16:06 rationalism

Thank you! @rationalism

Jun 11 '24 22:06 DD-DuDa

triton triton copied to clipboard

[Feature Request] Add FP8 GEMM to Ada (SM89)

triton
triton copied to clipboard