Comparison exception: The values for attribute 'shape' do not match: torch.Size([]) != torch.Size([1, 1, 40, 40, 2]).
Describe the issue: I prune yolov7 model with L1Pruner. I followed this guide https://github.com/microsoft/nni/blob/master/examples/compression/pruning/norm_pruning.py . I added this code after this line https://github.com/WongKinYiu/yolov7/blob/84932d70fb9e2932d0a70e4a1f02a1d6dd1dd6ca/train.py#L100
from nni.compression.pruning import L1NormPruner, L2NormPruner, FPGMPruner
from nni.compression.speedup import ModelSpeedup
from nni.compression.utils import auto_set_denpendency_group_ids
config_list = [{
# 'total_sparsity': 0.1,
'sparse_ratio': 0.5,
'op_types': ['Conv2d'],
}]
dummy_input = torch.rand([1, 3, 640, 640]).to(device)
config_list = auto_set_denpendency_group_ids(model, config_list, dummy_input)
pruner = L1NormPruner(model, config_list)
_, masks = pruner.compress()
pruner.unwrap_model()
model = ModelSpeedup(model, dummy_input, masks).speedup_model()
torch.save(model, "pruning_nni_yolov7.pt")
exit()
But I got this error
First diverging operator:
Node diff:
- %model : __torch__.torch.nn.modules.container.___torch_mangle_398.Sequential = prim::GetAttr[name="model"](%self.1)
? --
+ %model : __torch__.torch.nn.modules.container.___torch_mangle_812.Sequential = prim::GetAttr[name="model"](%self.1)
? ++
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
Node:
%2445 : Tensor = prim::Constant[value={2}](), scope: __module.model.105 # /opt/nvidia/deepstream/deepstream-6.2/sources/yolo_deepstream/FINAL_INVESTIGATION/yolov7_removestem/models/yolo.py:135:0
Source Location:
/opt/nvidia/deepstream/deepstream-6.2/sources/yolo_deepstream/FINAL_INVESTIGATION/yolov7_removestem/models/yolo.py(135): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1488): _slow_forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1501): _call_impl
/opt/nvidia/deepstream/deepstream-6.2/sources/yolo_deepstream/FINAL_INVESTIGATION/yolov7_removestem/models/yolo.py(625): forward_once
/opt/nvidia/deepstream/deepstream-6.2/sources/yolo_deepstream/FINAL_INVESTIGATION/yolov7_removestem/models/yolo.py(599): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1488): _slow_forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1501): _call_impl
/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py(1056): trace_module
/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py(794): trace
/usr/local/lib/python3.8/dist-packages/nni/common/graph_utils.py(91): _trace
/usr/local/lib/python3.8/dist-packages/nni/common/graph_utils.py(67): __init__
/usr/local/lib/python3.8/dist-packages/nni/common/graph_utils.py(265): __init__
/usr/local/lib/python3.8/dist-packages/nni/compression/utils/shape_dependency.py(58): __init__
/usr/local/lib/python3.8/dist-packages/nni/compression/utils/shape_dependency.py(135): __init__
/usr/local/lib/python3.8/dist-packages/nni/compression/utils/dependency.py(34): auto_set_denpendency_group_ids
train_pruning_yolov7.py(112): train
train_pruning_yolov7.py(639): <module>
Comparison exception: The values for attribute 'shape' do not match: torch.Size([]) != torch.Size([1, 1, 40, 40, 2]).
This error encounter also with L2NormPruner and FPGMPruner . I attached log file.
pruning_l1norm.log. @J-shang @ultmaster please help me.
Environment:
- NNI version: 3.0
- Training service (local|remote|pai|aml|etc): local
- Client OS: Ubuntu
- Server OS (for remote mode only):
- Python version: 3.8.10
- PyTorch/TensorFlow version: 2.0.1+cu117
- Is conda/virtualenv/venv used?: Docker
- Is running in Docker?: Yes
Configuration:
- Experiment config (remember to remove secrets!):
- Search space:
Log message:
- nnimanager.log:
- dispatcher.log:
- nnictl stdout and stderr:
How to reproduce it?:
- Clone https://github.com/WongKinYiu/yolov7
- Edit file
train.pyas in the above describing part. - Run command
python3 train_pruning_yolov7.py --workers 8 --device 0 \
--batch-size 1 --data data/coco.yaml \
--img 640 640 --cfg cfg/training/yolov7.yaml\
--weights 'yolov7.pt' --name yolov7_nni \
--hyp data/hyp.scratch.custom.yaml --epochs 1
I had a similar issue and solved it by putting the model in eval() and passing dummy input through it once, before doing pruning.
@MarkusDrange Thank you so much for suggestion. I am going to try it later. Instead of using nni, I used torch-pruning. Could you share experience with pruning yolov7 by using NNI? Is the tradeoff between mAP and speed good?
@MarkusDrange
Can you save pruned yolov7 model by using torch.save(model, <path-to-save>)?
Sorry, I am not really working on an identical case, I am working on tracing a yolov8 model and just mentioned the solution as the error message I got was very similar to yours.
A possible fix there could be that due to the fact that the yolov7 model possibly also has a hierarchy of classes (as my yolov8 has), model.model is the actual model that you want to save.
@MarkusDrange Thanks. I have one more question. Could you finetuning with multiple GPUs after pruning?
I had a similar issue and solved it by putting the model in eval() and passing dummy input through it once, before doing pruning.我有一个类似的问题,并通过将模型放入eval()中并在进行修剪之前传递一次虚拟输入来解决它。
Can you share the implementation code
Sorry, I am not really working on an identical case, I am working on tracing a yolov8 model and just mentioned the solution as the error message I got was very similar to yours.
A possible fix there could be that due to the fact that the yolov7 model possibly also has a hierarchy of classes (as my yolov8 has), model.model is the actual model that you want to save.
I followed your approach but still made the same mistake. Can you share this implementation code?