lishicheng1996 issues

Repositories
Issues
Comments

Results 2 issues of


                                            lishicheng1996

decoder MMHA kernel support INT8 SCALE_Q_INSTEAD_OF_K and SCALE_P_INS…

Following the logic of MMHA_FP8_SCALE_Q_INSTEAD_OF_K and MMHA_FP8_SCALE_P_INSTEAD_OF_V, I implemented the INT8 version. It is theoretically equivalent to the original compute logic without any numeric accuracy degradation. I tested the speed...

[Feature Request] May the Calibration Cache in the roadmap?

### Describe the feature request Onnxruntime int8 quantization may generate a INT8 calibration cache file to store the scales or tensor ranges, just like TRT, to avoid doing calibration with...

feature request

quantization