Umang Yadav
Umang Yadav
This can be done by shifting zero point by -128.
### Problem Description ### Parsing OCP FP8 Model This would require MIGraphX to expose E4M3FN data type into the IR. Currently only E4M3FNUZ type is exposed. It is probably not...
`migraphx-driver` doesn't seem to quantize to int8 or fp8 https://github.com/ROCm/AMDMIGraphX/blob/1199fbeeb83539a5cb6a033745e67bfb661875ca/src/driver/verify.cpp#L97 Need to add quantization for 8 bits and combination of `--fp16 --int8`.
https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/pull/2506/files This PR had to disable FP8 tests for the CPU backend. Ref implementation is doing Float -- > Fp8 -- > Float conversion but CPU backend is doing entire...
If the E4M3FN model is quantized using QDQ pairs then, it can be converted into E4M3FNUZ types by multiplying scales with some constants. This would allow using E4M3FN models inside...