nni icon indicating copy to clipboard operation
nni copied to clipboard

Comparison exception: The values for attribute 'shape' do not match: torch.Size([]) != torch.Size([1, 1, 40, 40, 2]).

Open aidevmin opened this issue 2 years ago • 7 comments

Describe the issue: I prune yolov7 model with L1Pruner. I followed this guide https://github.com/microsoft/nni/blob/master/examples/compression/pruning/norm_pruning.py . I added this code after this line https://github.com/WongKinYiu/yolov7/blob/84932d70fb9e2932d0a70e4a1f02a1d6dd1dd6ca/train.py#L100

    from nni.compression.pruning import L1NormPruner, L2NormPruner, FPGMPruner
    from nni.compression.speedup import ModelSpeedup
    from nni.compression.utils import auto_set_denpendency_group_ids
    
    config_list = [{
        # 'total_sparsity': 0.1,
        'sparse_ratio': 0.5,
        'op_types': ['Conv2d'],
    }]
    
    dummy_input = torch.rand([1, 3, 640, 640]).to(device)
    config_list = auto_set_denpendency_group_ids(model, config_list, dummy_input)
    
    pruner = L1NormPruner(model, config_list)
    _, masks = pruner.compress()
    pruner.unwrap_model()
    
    
    model = ModelSpeedup(model, dummy_input, masks).speedup_model()
    torch.save(model, "pruning_nni_yolov7.pt")
    exit()

But I got this error

        First diverging operator:
        Node diff:
                - %model : __torch__.torch.nn.modules.container.___torch_mangle_398.Sequential = prim::GetAttr[name="model"](%self.1)
                ?                                                               --
                + %model : __torch__.torch.nn.modules.container.___torch_mangle_812.Sequential = prim::GetAttr[name="model"](%self.1)
                ?                                                                ++
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
        Node:
                %2445 : Tensor = prim::Constant[value={2}](), scope: __module.model.105 # /opt/nvidia/deepstream/deepstream-6.2/sources/yolo_deepstream/FINAL_INVESTIGATION/yolov7_removestem/models/yolo.py:135:0
        Source Location:
                /opt/nvidia/deepstream/deepstream-6.2/sources/yolo_deepstream/FINAL_INVESTIGATION/yolov7_removestem/models/yolo.py(135): forward
                /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1488): _slow_forward
                /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1501): _call_impl
                /opt/nvidia/deepstream/deepstream-6.2/sources/yolo_deepstream/FINAL_INVESTIGATION/yolov7_removestem/models/yolo.py(625): forward_once
                /opt/nvidia/deepstream/deepstream-6.2/sources/yolo_deepstream/FINAL_INVESTIGATION/yolov7_removestem/models/yolo.py(599): forward
                /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1488): _slow_forward
                /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1501): _call_impl
                /usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py(1056): trace_module
                /usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py(794): trace
                /usr/local/lib/python3.8/dist-packages/nni/common/graph_utils.py(91): _trace
                /usr/local/lib/python3.8/dist-packages/nni/common/graph_utils.py(67): __init__
                /usr/local/lib/python3.8/dist-packages/nni/common/graph_utils.py(265): __init__
                /usr/local/lib/python3.8/dist-packages/nni/compression/utils/shape_dependency.py(58): __init__
                /usr/local/lib/python3.8/dist-packages/nni/compression/utils/shape_dependency.py(135): __init__
                /usr/local/lib/python3.8/dist-packages/nni/compression/utils/dependency.py(34): auto_set_denpendency_group_ids
                train_pruning_yolov7.py(112): train
                train_pruning_yolov7.py(639): <module>
        Comparison exception:   The values for attribute 'shape' do not match: torch.Size([]) != torch.Size([1, 1, 40, 40, 2]).

This error encounter also with L2NormPruner and FPGMPruner . I attached log file. pruning_l1norm.log. @J-shang @ultmaster please help me.

Environment:

  • NNI version: 3.0
  • Training service (local|remote|pai|aml|etc): local
  • Client OS: Ubuntu
  • Server OS (for remote mode only):
  • Python version: 3.8.10
  • PyTorch/TensorFlow version: 2.0.1+cu117
  • Is conda/virtualenv/venv used?: Docker
  • Is running in Docker?: Yes

Configuration:

  • Experiment config (remember to remove secrets!):
  • Search space:

Log message:

  • nnimanager.log:
  • dispatcher.log:
  • nnictl stdout and stderr:

How to reproduce it?:

  • Clone https://github.com/WongKinYiu/yolov7
  • Edit file train.py as in the above describing part.
  • Run command
python3 train_pruning_yolov7.py --workers 8 --device 0 \
        --batch-size 1 --data data/coco.yaml \
        --img 640 640 --cfg cfg/training/yolov7.yaml\
        --weights 'yolov7.pt' --name yolov7_nni \
        --hyp data/hyp.scratch.custom.yaml --epochs 1

aidevmin avatar Sep 17 '23 09:09 aidevmin

I had a similar issue and solved it by putting the model in eval() and passing dummy input through it once, before doing pruning.

MarkusDrange avatar Sep 22 '23 14:09 MarkusDrange

@MarkusDrange Thank you so much for suggestion. I am going to try it later. Instead of using nni, I used torch-pruning. Could you share experience with pruning yolov7 by using NNI? Is the tradeoff between mAP and speed good?

aidevmin avatar Sep 22 '23 15:09 aidevmin

@MarkusDrange Can you save pruned yolov7 model by using torch.save(model, <path-to-save>)?

aidevmin avatar Sep 25 '23 03:09 aidevmin

Sorry, I am not really working on an identical case, I am working on tracing a yolov8 model and just mentioned the solution as the error message I got was very similar to yours.

A possible fix there could be that due to the fact that the yolov7 model possibly also has a hierarchy of classes (as my yolov8 has), model.model is the actual model that you want to save.

MarkusDrange avatar Sep 25 '23 08:09 MarkusDrange

@MarkusDrange Thanks. I have one more question. Could you finetuning with multiple GPUs after pruning?

aidevmin avatar Sep 30 '23 15:09 aidevmin

I had a similar issue and solved it by putting the model in eval() and passing dummy input through it once, before doing pruning.我有一个类似的问题,并通过将模型放入eval()中并在进行修剪之前传递一次虚拟输入来解决它。

Can you share the implementation code

Gooddz1 avatar Feb 01 '24 12:02 Gooddz1

Sorry, I am not really working on an identical case, I am working on tracing a yolov8 model and just mentioned the solution as the error message I got was very similar to yours.

A possible fix there could be that due to the fact that the yolov7 model possibly also has a hierarchy of classes (as my yolov8 has), model.model is the actual model that you want to save.

I followed your approach but still made the same mistake. Can you share this implementation code?

Gooddz1 avatar Feb 26 '24 01:02 Gooddz1