ONNX post-training static quantization

Open jpata opened this issue 1 year ago • 0 comments

Previously in #206 we got pytorch post-training static quantization to work, but the model was not faster in inference, probably due to some missing ops on CPU/GPU in the pytorch runtime.

However, we are currently using ONNX for inference, and ONNX has its own system of quantization: https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html

We should also try to do quantization via ONNX and see if that will be faster in CMSSW.

Nov 01 '24 13:11 jpata