Mehmet Aktukmak
Results
2
issues of
Mehmet Aktukmak
Currently, the E4M3 fP8 format implemented is ARM-Intel-Nvidia style. However, there is another style, IEEE 754 (torch name is float8_e4m3fnuz), which has different bit configuration and min-max values. This pull...
Int8 matrix multiplication kernels are currently called on CUDA and CPU devices when activations and weights are quantized to int8. However, FP8 matmuls are not used when activations and weights...