particleflow
particleflow copied to clipboard
ONNX post-training static quantization
Previously in #206 we got pytorch post-training static quantization to work, but the model was not faster in inference, probably due to some missing ops on CPU/GPU in the pytorch runtime.
However, we are currently using ONNX for inference, and ONNX has its own system of quantization: https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html
We should also try to do quantization via ONNX and see if that will be faster in CMSSW.