sparseml
sparseml copied to clipboard
YOLOv8 - INT4 Training
Hello, I'm trying to train YOLOv8-large in int4 format. I took the training recipe available at sparsezoo for training yolov8-large. I modified the num_bits to 4 everywhere. I also saw here #1679 that we can add channel-wise quantisation so I've added that as well. However, the performance is quite inferior ([email protected])? Also I will be exporting the model to onnx for inference on a FPFGA (5-bit), so I need the model to be strictly 4 bit.
Recipe
version: 1.1.0
metadata:
General Hyperparams
pruning_num_epochs: 90 pruning_init_lr: 0.01 pruning_final_lr: 0.0002 weights_warmup_lr: 0 biases_warmup_lr: 0.1 qat_init_lr: 1e-4 qat_final_lr: 1e-6
Pruning Hyperparams
init_sparsity: 0.05 pruning_start_epoch: 4 pruning_end_epoch: 50 pruning_update_frequency: 1.0
Quantization variables
qat_start_epoch: eval(pruning_num_epochs) qat_epochs: 3 qat_end_epoch: eval(qat_start_epoch + qat_epochs) observer_freeze_epoch: eval(qat_end_epoch) bn_freeze_epoch: eval(qat_end_epoch) qat_ft_epochs: 3 num_epochs: eval(pruning_num_epochs + qat_epochs + 2 * qat_ft_epochs)
#Modifiers training_modifiers:
-
!EpochRangeModifier start_epoch: 0 end_epoch: eval(num_epochs)
-
!LearningRateFunctionModifier start_epoch: 3 end_epoch: eval(pruning_num_epochs) lr_func: linear init_lr: eval(pruning_init_lr) final_lr: eval(pruning_final_lr)
-
!LearningRateFunctionModifier start_epoch: 0 end_epoch: 3 lr_func: linear init_lr: eval(weights_warmup_lr) final_lr: eval(pruning_init_lr) param_groups: [0, 1]
-
!LearningRateFunctionModifier start_epoch: 0 end_epoch: 3 lr_func: linear init_lr: eval(biases_warmup_lr) final_lr: eval(pruning_init_lr) param_groups: [2]
-
!LearningRateFunctionModifier start_epoch: eval(qat_start_epoch) end_epoch: eval(qat_end_epoch) lr_func: cosine init_lr: eval(qat_init_lr) final_lr: eval(qat_final_lr)
-
!LearningRateFunctionModifier start_epoch: eval(qat_end_epoch) end_epoch: eval(qat_end_epoch + qat_ft_epochs) lr_func: cosine init_lr: eval(qat_init_lr) final_lr: eval(qat_final_lr)
-
!LearningRateFunctionModifier start_epoch: eval(qat_end_epoch + qat_ft_epochs) end_epoch: eval(qat_end_epoch + 2 * qat_ft_epochs) lr_func: cosine init_lr: eval(qat_init_lr) final_lr: eval(qat_final_lr)
pruning_modifiers:
-
!ConstantPruningModifier start_epoch: eval(qat_start_epoch) params: ["re:^((?!dfl).)*$"]
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.46 params:
- model.0.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8999 params:
- model.1.conv.weight
- model.4.m.1.cv1.conv.weight
- model.4.m.4.cv2.conv.weight
- model.6.m.1.cv1.conv.weight
- model.21.m.1.cv1.conv.weight
- model.21.m.2.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.514 params:
- model.2.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.7675 params:
- model.2.cv2.conv.weight
- model.12.m.0.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8117 params:
- model.3.conv.weight
- model.8.cv2.conv.weight
- model.12.m.1.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.6457 params:
- model.4.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8627 params:
- model.4.cv2.conv.weight
- model.5.conv.weight
- model.8.m.1.cv1.conv.weight
- model.22.cv3.1.1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8764 params:
- model.4.m.0.cv1.conv.weight
- model.6.m.3.cv2.conv.weight
- model.7.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9189 params:
- model.4.m.1.cv2.conv.weight
- model.6.m.5.cv1.conv.weight
- model.15.m.2.cv1.conv.weight
- model.18.m.0.cv1.conv.weight
- model.18.m.2.cv1.conv.weight
- model.22.cv3.0.1.conv.weight
- model.22.cv3.2.0.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8305 params:
- model.4.m.2.cv1.conv.weight
- model.4.m.5.cv2.conv.weight
- model.6.cv2.conv.weight
- model.6.m.4.cv2.conv.weight
- model.15.m.0.cv2.conv.weight
- model.15.m.1.cv1.conv.weight
- model.15.m.2.cv2.conv.weight
- model.18.cv2.conv.weight
- model.21.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.7417 params:
- model.4.m.2.cv2.conv.weight
- model.18.cv1.conv.weight
- model.22.cv3.2.1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8888 params:
- model.4.m.3.cv2.conv.weight
- model.6.m.3.cv1.conv.weight
- model.15.m.1.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.6063 params:
- model.6.cv1.conv.weight
- model.12.cv1.conv.weight
- model.12.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9468 params:
- model.6.m.0.cv1.conv.weight
- model.21.m.2.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.7907 params:
- model.6.m.0.cv2.conv.weight
- model.8.m.0.cv1.conv.weight
- model.12.m.0.cv2.conv.weight
- model.12.m.1.cv1.conv.weight
- model.22.cv2.2.0.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9409 params:
- model.6.m.1.cv2.conv.weight
- model.18.m.2.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.6811 params:
- model.8.cv1.conv.weight
- model.15.cv1.conv.weight
- model.15.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9343 params:
- model.8.m.0.cv2.conv.weight
- model.8.m.1.cv2.conv.weight
- model.18.m.0.cv2.conv.weight
- model.18.m.1.cv1.conv.weight
- model.21.m.0.cv1.conv.weight
- model.21.m.1.cv2.conv.weight
- model.22.cv3.0.0.conv.weight
- model.22.cv3.1.0.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9771 params:
- model.8.m.2.cv1.conv.weight
- model.22.cv2.0.0.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.989 params:
- model.8.m.2.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.5626 params:
- model.9.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.713 params:
- model.9.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9099 params:
- model.12.m.2.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.927 params:
- model.12.m.2.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9521 params:
- model.16.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9569 params:
- model.18.m.1.cv2.conv.weight
- model.19.conv.weight
- model.21.m.0.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8474 params:
- model.21.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9651 params:
- model.22.cv2.1.0.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
-
!GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.4 params:
- model.22.cv3.0.2.weight
- model.22.cv3.1.2.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1
quantization_modifiers:
- !QuantizationModifier start_epoch: eval(qat_start_epoch) disable_quantization_observer_epoch: eval(observer_freeze_epoch) freeze_bn_stats_epoch: eval(bn_freeze_epoch) ignore: ['Upsample', 'Concat', 'model.22.dfl.conv'] scheme_overrides: model.2.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.2.m.0.cv1.conv: input_activations: null model.2.m.0.add_input_0: input_activations: null model.4.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.4.m.0.cv1.conv: input_activations: null model.4.m.0.add_input_0: input_activations: null model.4.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.5.conv: input_activations: null model.6.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.6.m.0.cv1.conv: input_activations: null model.6.m.0.add_input_0: input_activations: null model.6.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.7.conv: input_activations: null output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.8.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.8.m.0.cv1.conv: input_activations: null model.8.m.0.add_input_0: input_activations: null model.8.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.9.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.9.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.12.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.12.m.0.cv1.conv: input_activations: null model.12.m.0.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.12.m.1.cv1.conv: input_activations: null model.12.m.1.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.12.m.2.cv1.conv: input_activations: null model.12.m.2.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.12.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.15.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.15.m.0.cv1.conv: input_activations: null model.15.m.0.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.15.m.1.cv1.conv: input_activations: null model.15.m.1.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.15.m.2.cv1.conv: input_activations: null model.15.m.2.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.15.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.16.conv: input_activations: null model.16.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.18.cv1.act: output_activations: num_bits: 4 symmetric: false weights: num_bits: 4 symmetric: True strategy: "channel" model.18.m.0.cv1.conv: input_activations: null model.18.m.0.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.18.m.1.cv1.conv: input_activations: null model.18.m.1.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.18.m.2.cv1.conv: input_activations: null model.18.m.2.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.19.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.21.cv1.act: output_activations: num_bits: 4 symmetric: false weights: num_bits: 4 symmetric: True strategy: "channel" model.21.m.0.cv1.conv: input_activations: null model.21.m.0.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.21.m.1.cv1.conv: input_activations: null model.21.m.1.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.21.m.2.cv1.conv: input_activations: null model.21.m.2.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.22.cv2.0.0.conv: input_activations: null model.22.cv3.0.0.conv: input_activations: null
Hi @yoloyash we haven't looked into taking YOLO models to 4-bit but I do agree that this drop in accuracy is unexpected. You can try using our newer repo which contains better support for 4-bit + channelwise for PTQ if you are interested: https://github.com/neuralmagic/compressed-tensors
Hello, I'm trying to train YOLOv8-large in int4 format. I took the training recipe available at sparsezoo for training yolov8-large. I modified the num_bits to 4 everywhere. I also saw here #1679 that we can add channel-wise quantisation so I've added that as well. However, the performance is quite inferior ([email protected])? Also I will be exporting the model to onnx for inference on a FPFGA (5-bit), so I need the model to be strictly 4 bit.
Recipe
Hi, can you tell me on which versions of Pytorch onnx DeepSparse sparseml you get pruning and yolov8 quantization? I have issue on that
Per the main README announcement, the product is being deprecated by early June 2, 2025. Closing issue as we cannot address. Thank you for the inputs!