Riyad Islam
Riyad Islam
@adaber please follow the ModelOpt example [here](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/onnx_ptq#quantize-an-onnx-model) or [python API's](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/_onnx_quantization.html#call-ptq-function) to quantize an ONNX model. Note that, `modelopt.onnx.quantization` supports ONNXRuntime provided [entropy calibration](https://github.com/microsoft/onnxruntime/blob/2dae8aaced747cb758d04b4b74953e37ee663460/onnxruntime/python/tools/quantization/calibrate.py#L621), see the command line help for other...
> people complain about not getting as good of a result It means, sometimes TensorRT deployed EQ network latency > IQ network latency. ModelOpt team is actively working with TensorRT...