Liger-Kernel
Liger-Kernel copied to clipboard
gemm fp8 e4m3
Summary
Implemented FP8 gemm with E4M3 representation for FP8.
Testing Done
tested square matrices of varying sizes (64, 256, 512, 1024, 2048) + non-square matrices of varying sizes and compared against torch matmul with appropriate casting for backward (torch.matmul doesn't support fp8_e4m3 dtype for backward).
FP8 gemm will only work on SM_89+
- Hardware Type: RTX 4090
- [x] run
make test
to ensure correctness - [x] run
make checkstyle
to ensure code style - [x] run
make test-convergence
to ensure convergence