quanto icon indicating copy to clipboard operation
quanto copied to clipboard

Support for FP8 Matmuls

Open maktukmak opened this issue 6 months ago • 2 comments

Int8 matrix multiplication kernels are currently called on CUDA and CPU devices when activations and weights are quantized to int8. However, FP8 matmuls are not used when activations and weights are quantized to float8. Matmul is being done in full precision in that case, if I am not mistaken. What's the current situation and roadmap for using float8 matrix multiplications, for instance through _scaled_mm?

maktukmak avatar Aug 09 '24 18:08 maktukmak