Torch-Pruning
Torch-Pruning copied to clipboard
Yolov7 pruned model does not detect anything?
@VainF Thanks for amazing repo. I tried to run inference for yolov7 pruned model on one image, but pruned model did not detect anything (output image without bboxes) https://github.com/VainF/Torch-Pruning/blob/master/benchmarks/prunability/readme.md#3-yolo-v7
python yolov7_detect_pruned.py --weights yolov7.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpg
I saw that in the file yolov7_detect_pruned.py
you already set ignored_layers
################################################################################
# Pruning
example_inputs = torch.randn(1, 3, 224, 224).to(device)
imp = tp.importance.MagnitudeImportance(p=2) # L2 norm pruning
ignored_layers = []
from models.yolo import Detect
for m in model.modules():
if isinstance(m, Detect):
ignored_layers.append(m)
print(ignored_layers)
iterative_steps = 1 # progressive pruning
pruner = tp.pruner.MagnitudePruner(
model,
example_inputs,
importance=imp,
iterative_steps=iterative_steps,
ch_sparsity=0.5, # remove 50% channels, ResNet18 = {64, 128, 256, 512} => ResNet18_Half = {32, 64, 128, 256}
ignored_layers=ignored_layers,
)
base_macs, base_nparams = tp.utils.count_ops_and_params(model, example_inputs)
pruner.step()
pruned_macs, pruned_nparams = tp.utils.count_ops_and_params(model, example_inputs)
print(model)
print("Before Pruning: MACs=%f G, #Params=%f G"%(base_macs/1e9, base_nparams/1e9))
print("After Pruning: MACs=%f G, #Params=%f G"%(pruned_macs/1e9, pruned_nparams/1e9))
####################################################################################
But I saw in the log file for Detect module before pruning
(105): Detect(
(m): ModuleList(
(0): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
(1): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
(2): Conv2d(1024, 255, kernel_size=(1, 1), stride=(1, 1))
)
)
and after pruning
(105): Detect(
(m): ModuleList(
(0): Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
(1): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
(2): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
)
)
Could you @VainF check it again? Thanks
Hello @aidevmin. It requires post-training. Please finetune the pruned model on COCO for a few epochs with a small learning rate.
Thank @VainF for quick response. It means that I need to run file yolov7_train_pruned.py
. Is it right? I saw file yolov7_train_pruned.py
file too, but input of this file is yolov7_training.pt
, not yolov7.pt.
- Did you check mAP of output model from
yolov7_train_pruned.py
on COCO? - Pruned model has different architecture compared to original model. It means that we need save all model and weights to .pth file to use later instead of saving only weights (need to load architecture). Is it right? It may be a problem when we used pruned model in another environment.
I will try running file yolov7_train_pruned.py
- mAP: The performance of pruned yolov7 has not been checked.
- Save & Load: Please try tp.state_dict & tp.load_state_dict. This allows us to save the attributes like conv.in_channels into a .pth and re-load the pruned model using an unpruned one.
Thanks a lot. I will try it.
- mAP: The performance of pruned yolov7 has not been checked.
- Save & Load: Please try tp.state_dict & tp.load_state_dict. This allows us to save the attributes like conv.in_channels into a .pth and re-load the pruned model using an unpruned one.
I recognized that obtained model after running pruning and finetuning from yolov7_train_pruned.py
can not paramaterized.
- mAP: The performance of pruned yolov7 has not been checked.
- Save & Load: Please try tp.state_dict & tp.load_state_dict. This allows us to save the attributes like conv.in_channels into a .pth and re-load the pruned model using an unpruned one.
@VainF thanks, it works. But it seem that tp.state_dict save whole weights of model and architecture, because pruned model size is larger than the original .pt model (only weights). Is there any weights to save only weights and still ensure that model is loaded with pruned model architecture properly?
recognized that obtained model after running pruning and finetuning from yolov7_train_pruned.py can not paramaterized.
@aidevmin ,I dont think that parametrization is needed as exporting to onnx will apply all of the necessary optimisation on the model as well as exporting the correct model (ema if applicable).
- Source saying that yolov7 reparametrization is mainly about fusing conv + BN and stripping old weights and converting the last layer from IDetect to Detect https://github.com/WongKinYiu/yolov7/commit/09d6293f32f38a48de820b6bf47b85736bf8c81c
- Source saying that this optimisation is already performed when exporting to onnx https://github.com/pytorch/pytorch/pull/40547
I can think of two simple ways to shortcut this problem:
- Train using Detect instead of Idetect (yolov7.pt + cfg/deploy/.) (my colleague tried training on both (Detect and IDetect) and the difference is about 1 point of [email protected] in favour of IDetect)
- Convert the last layer from IDetect to Detect manually.
mAP: The performance of pruned yolov7 has not been checked.
@VainF I have retrained yolov7 pruned on COCO for 300 epochs and there is a noticeable degradation of about 5-6 points across all metrics which is amazing considering that we eliminated 75% of the network. (I am currently trying to apply knowledge distillation
so that the pruned model (student) can learn from the baseline model(teacher) during its retraining in your sricpt as well as ignoring some of the most sensible layers from being pruned).
Note: the pruning must be done on 0.5 sparsity to have a good acceleration on TensorRT, other values will actually hinder the engine's speed.
Save & Load: Please try tp.state_dict & tp.load_state_dict. This allows us to save the attributes like conv.in_channels into a .pth and re-load the pruned model using an unpruned one. @VainF thanks, it works. But it seem that tp.state_dict save whole weights of model and architecture, because pruned model size is larger than the original .pt model (only weights). Is there any weights to save only weights and still ensure that model is loaded with pruned model architecture properly?
@aidevmin The solution that I implemented for saving while being able to reload the model is to create a yaml configuration file for the pruned model at the moment of the pruning, loading the pruned model with this configuration will work perfectly. There are also EMA issues that need to be addressed.
I may submit a PR with this adjustments in the following weeks if I find the time. Don't hesitate to ask for clarifications.
@AymenBOUGUERRA Thanks for detail respone.
I dont think that parametrization is needed as exporting to onnx will apply all of the necessary optimisation on the model as well as exporting the correct model (ema if applicable).
Yes, I agree with you. I can successfully export .pt to onnx without repameterization
One interesting thing I found that after removing 1% channels, speed of pruned model (TRT model) is larger than the one before pruning. It is suprised. I will investigate more and inform to you.
@aidevmin Hello again,
Yes, I agree with you. I can successfully export .pt to onnx without repameterization
Be careful to always load the model using the provided function in the repo such as attempt_load()
, the reason for this is that un-repameterized models have 2 types of weights in the checkpoints "model" and "EMA", EMA are about 2 to 5 points better than the default weights in terms of mAP and other metrics, the provided function will try to load EMA first and then model of ema is not present.
One interesting thing I found that after removing 1% channels, speed of pruned model (TRT model) is larger than the one before pruning. It is suprised. I will investigate more and inform to you.
I have already encountered this issues, here are the results of my investigation:
furthermore, it seems that the pruning ratio must be (1-(1/2^n)) with 0<n<5 in order to have a speedup in TensorRT, and using such aggressive pruning ratios will require you to not only finetune the model but rather retrain it from scratch as the feature maps are utterly destroyed.
Don't hesitate for any question or clarification.
@AymenBOUGUERRA Thank you so much for information.
furthermore, it seems that the pruning ratio must be (1-(1/2^n)) with 0<n<5 in order to have a speedup in TensorRT, and using such aggressive pruning ratios will require you to not only finetune the model but rather retrain it from scratch as the feature maps are utterly destroyed.
May be we need to find other way such as KD to keep resonable acc. I dont know why some pruned models are slower than the original model.
@aidevmin
May be we need to find other way such as KD to keep reasonable acc. I don't know why some pruned models are slower than the original model.
I have been trying to use KD to reduce the overall degradation of hte pruning, including loss distillation, and feature distillation. and even thought the models converge faster, they are unable to surpasse the pruned models without KD in terms of accuracies, even thought the KD works (tried teaching the model without annotation or ground truth and only by imitation and it converges); this, in my opinion can only mean that the pruned models are simply too small to reach the accuracies to their full size counterparts. I also tried this implementation https://github.com/wonbeomjang/yolov5-knowledge-distillation
I tried also ignoring layers from being pruned, this is an alternative to changing the channel sparsity of 0.5, so instead of using 0.1 sparsity, you can use 0.5 and exclude x amount of layers to reduce the loss ans the number of parameters pruned. here are some of my tests
So, excluding 15 layers from being pruned will reduce the mAP loss from 6 points to 4 points and still give us a considerable speed up.
I haven't tried excluding more, but I am pretty sure that the graph pruning loss in function of excluded layers from pruning will follow a logarithmic curve starting at -6 and eventually reaching 0
@AymenBOUGUERRA
The solution that I implemented for saving while being able to reload the model is to create a yaml configuration file for the pruned model at the moment of the pruning, loading the pruned model with this configuration will work perfectly.
Could you share source to create a yaml configuration file for the pruned model at the moment of the pruning? Thanks a lot.
@aidevmin Create a .py file in the same place as the standard yolov7 train script and copy the whole code.
You can use --ignored layers "11 28" to ignore layers 11 and 28 so they don't get pruned
The argument --original_configuration will load the original yaml file that is used in this configuration in order to calculate and creat the new pruned yaml configuration (it will load by defaut the yolov7 config but it should be changed if you are working with yolov7x for exemple).
When you used this script to prune and train a model, the pruned model yaml file will be generated in the same folder where you training is saved, typically runs/train/1/pruned.yaml
you can then use --cfg_pruned "runs/train/1/pruned.yaml" to reload the pruned model with its normal weights and ema weights and continue your training.
if you wish to load the model for inference, you do not need this script, because the attempt_load() function the repo does handle the pruned model correctly.
hello @VainF , I'm a beginner and I'm also having problems detecting anything after pruning with yolov5s, is there a fine tuning code for yolov5? please
Hi @AFallDay, I'm sorry there is no finetuning code for yolov5 in this project. You can try the official training code from yolov5.