super-gradients icon indicating copy to clipboard operation
super-gradients copied to clipboard

Save the quantized YoloNAS model

Open james-imi opened this issue 1 year ago • 4 comments

💡 Your Question

After doing quantization

q_util = SelectiveQuantizer(
    default_quant_modules_calibrator_weights="max",
    default_quant_modules_calibrator_inputs="histogram",
    default_per_channel_quant_weights=True,
    default_learn_amax=False,
    verbose=True,

)

q_util.quantize_module(model)

How do I save the quantized model (not as ONNX)?

Versions

No response

james-imi avatar Mar 13 '24 07:03 james-imi

Why would you want to save it in the first place? I believe you can get model state using model.state_dict()

BloodAxe avatar Mar 15 '24 07:03 BloodAxe

So you dont have to quantize it every time you load it?

james-imi avatar Mar 15 '24 07:03 james-imi

You don't want to use eager pytorch inference mode on quantized model. It would be horribly slow doing inference this way. I mean you can, and it work but I strongly suggest not doing it this way.

A quantized model in pytorch actually doing "fake quantization" where weights are stored as floats, and additional quantize/dequantize layers are added on top of that to "pretend" a model is quantized. This is necessary to achieve quantization-aware training or model calibration. But in reality you want to export a final quantized model to ONNX and TRT or OpenVINO that knows how to handle such model and build an optimized quantized model from it.

BloodAxe avatar Mar 15 '24 08:03 BloodAxe

@BloodAxe Hi thanks for the info. so using super gradients way of predict() is not a go-to for local CPU server inference, is that it?

For quantization-aware training, what would be the recommended steps? Is it still okay to to QAT then use super-gradients' predict for this?

james-imi avatar Mar 18 '24 04:03 james-imi

It would work, but it is not efficient way of inferencing quantized model. Please check these notebooks to see how you can use TensorRT or ONNXRuntime for model inference: https://github.com/Deci-AI/super-gradients/blob/03c445c0cc42743c66aa166b2e47a11a7cfc0eda/notebooks/YoloNAS_Inference_using_TensorRT.ipynb https://github.com/Deci-AI/super-gradients/blob/03c445c0cc42743c66aa166b2e47a11a7cfc0eda/src/super_gradients/examples/model_export/models_export.ipynb

BloodAxe avatar May 03 '24 12:05 BloodAxe

@BloodAxe any function in the repo for post processing ONNX results?

james-imi avatar May 06 '24 09:05 james-imi