sparseml
sparseml copied to clipboard
Remove QuantizeLinear/DequantizeLinear of ONNX model
Hi, I trained YOLOv8 model and exported the model to ONNX format by the quantization_recipe below, I set weight_bits=8 and activation_bits=8 to ensure the full-flow inference of quantized model is fixed-point uint8 value. But the QuantizeLinear and DequantizeLinear nodes still exist and they convert the activation to floating-point tensor. I checked the final_recipe.yaml and I saw that my activation_bits setting was disabled. I wonder if there's any way to get rid these nodes but still preserve model performance or another recipe setting to make model fully-integer inference? Thanks,
version: 1.1.0
# General variables
num_epochs: 20
init_lr: 1.e-3
final_lr: 1.e-6
lr_func: cyclic
# Quantization variables
qat_start_epoch: 1
observer_freeze_epoch: 3
bn_freeze_epoch: 3
training_modifiers:
- !EpochRangeModifier
start_epoch: 1
end_epoch: eval(num_epochs)
- !LearningRateFunctionModifier
start_epoch: eval(qat_start_epoch)
end_epoch: eval(num_epochs)
lr_func: cosine
init_lr: eval(init_lr)
final_lr: eval(final_lr)
quantization_modifiers:
- !QuantizationModifier
start_epoch: eval(qat_start_epoch)
disable_quantization_observer_epoch: eval(observer_freeze_epoch)
freeze_bn_stats_epoch: eval(bn_freeze_epoch)
# ignore: ['Upsample', 'Concat']
# tensorrt: False
quantize_linear_activations: True
quantize_conv_activations: True
# quantize_embedding_activations: True
# quantize_embeddings: True
# reduce_range: True
# exclude_module_types: ['Concat', 'Upsample']
weight_bits: 8
activation_bits: 8
model_fuse_fn_name: conv_bn_relus
# exclude_batchnorm: True