nni icon indicating copy to clipboard operation
nni copied to clipboard

Pruning model speed up error

Open hitesh-hitu opened this issue 2 years ago • 14 comments

After using the pruning API on the model, when I try to perform the speed up, it throws this error :

Error: TypeError: forward() missing 1 required positional argument: input

Traceback: m_speedup.speedup_model()

File "/pruning/lib/python3.6/site-packages/nni/compression/pytorch/speedup/compressor.py", line 503, in speedup model self.infer_modules_masks() File "/pruning/Python-test/lib/python3.6/site-packages/nni/compression/pytorch/speedup/compressor.py", line 349. in infer_modules masks self.update_direct_sparsity(curnode) File "/pruning/Python-test/lib/python3.6/site-packages/nni/compression/pytorch/speedup/compressor.py", line 219, in update direct sparsity state_dict-copy.deepcopy(module.state_dict()). batch_dim=self.batch_dim) File "/pruning/Python-test/lib/python3.6/site-packages/nni/compression/pytorch/speedup/infer_mask.py", line 80, in_init__ self.output = self.module("dummy_input) File "/pruning/Python-test/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward("input, **kwargs)

hitesh-hitu avatar Mar 21 '22 10:03 hitesh-hitu

Hello @hitesh-hitu , which model do you prune, , we recently encountered some similar issues and will focus on fixing them in the next release.

J-shang avatar Mar 22 '22 02:03 J-shang

Hello @J-shang thank you so much for the speedy response.

I am trying to prune a pytorch image enhancing model with an input shape of ( 1,3,580,580 ). The pruning API is doing fine and converting the weights to zeroes. This error is coming up when I try to speed up the model.

Can you please help me with this soon.

hitesh-hitu avatar Mar 22 '22 02:03 hitesh-hitu

pr #4149 has fixed the infer mask forward missing the required positional argument, which causes by a constant value pass to the forward input. You could have a try base on this pr, and if this issue is still occurring, information about the model structure you used can help us to fix your issue.

J-shang avatar Mar 22 '22 05:03 J-shang

@J-shang just one more clarification, as per the nni docs page, the model speed up is still in beta version and the limitation is that it doesn't support speed up of all the models. Is this error related to that ? Does latest version of nni support speed up of all the pytorch models after pruning ?

hitesh-hitu avatar Mar 22 '22 07:03 hitesh-hitu

@hitesh-hitu , your error may be mainly because you have some special modules in your model, and their forward input arguments have constant. But we can't locate the real cause if you don't show your model.

For your second question, speed up need to implement replacement logic for each specific pytorch module and nni have implemented most normal module right now (Conv2d, Linear...), if you meet some modules that nni not support, you can report an issue, and we will try to implement it.

J-shang avatar Mar 22 '22 07:03 J-shang

@J-shang thank you very much for the clarification. I'll check for the custom layers in my model.

Also, I made the changes from the pr #4149 on the latest version of NNI, the error continues to exist. I'll look for the special layer in the model. Can you please let me know, where the changes have to be made/added to support a special layer in the speedup module ?

hitesh-hitu avatar Mar 22 '22 13:03 hitesh-hitu

For a new module, need to add a replace function in https://github.com/microsoft/nni/blob/51d261e7256e2344f8d4cf270bff439819945c9a/nni/compression/pytorch/speedup/compress_modules.py#L11

But your error is occur before module replacement, you can check which module cause this error, which module this forward function belongs to.

J-shang avatar Mar 23 '22 01:03 J-shang

I don't understand the issue. Can you please elaborate and also point out where to fix the issue in the speed up module.

hitesh-hitu avatar Mar 23 '22 03:03 hitesh-hitu

@hitesh-hitu do you have logs before this error occurs like the following?

[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for layers.0
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for downsample
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for skip
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for layers.1
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for layers.2
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for layers.3
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for .aten::cat.6
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the .aten::cat.6
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the skip
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the layers.3
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the layers.2
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the layers.1
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the downsample
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the layers.0
[2022-03-23 13:12:05] INFO (nni.compression.pytorch.speedup.compressor/MainThread) resolve the mask conflict

you could find which layer update mask failed by these logs, or if you can give a simple code that we can reproduce your error. We can't determine this error and how to fix it because we don't know your model implementation.

J-shang avatar Mar 23 '22 05:03 J-shang

Yes, I'll provide the logs soon, please keep this issue open.

hitesh-hitu avatar Mar 25 '22 16:03 hitesh-hitu

@J-shang I use nni to pruning a model by trained mmdetection and mmrazor. My code is

config_list = [{
    'sparsity_per_layer': 0.5,
    'op_types': ['Conv2d'],
    }]
pruner = L1NormPruner(model, config_list)
# compress the model and generate the masks
_, masks = pruner.compress()
print(masks)
# show the masks sparsity
for name, mask in masks.items():
    print(name, ' sparsity : ', '{:.2}'.format(mask['weight'].sum() / mask['weight'].numel()))
# need to unwrap the model, if the model is wrapped before speedup
pruner._unwrap_model()
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
ModelSpeedup(model, torch.rand(1, 3, 1333, 800).to(device), masks).speedup_model()

The error is 1655348856(1)

Please give me some advice, thanks very much.

Audrey528 avatar Jun 16 '22 03:06 Audrey528

@Audrey528 , did model(torch.rand(1, 3, 1333, 800)) work with your model? could you find where your model use img_metas?

J-shang avatar Jun 16 '22 03:06 J-shang

@J-shang Thanks very much for your reply. (1333, 800) is the input image size of training. The data is like follow picture when doing inference for a image. Should I change the input data style? 1655355075(1)

Audrey528 avatar Jun 16 '22 04:06 Audrey528

yes, the dummy input is used for tracing the model graph, please refer usage of example_inputs in torch.jit.trace https://pytorch.org/docs/stable/generated/torch.jit.trace.html

J-shang avatar Jun 16 '22 05:06 J-shang

执行剪枝时候不要将模型放到多个GPU~

Bighhhzq avatar Mar 02 '23 07:03 Bighhhzq