nni
nni copied to clipboard
Support for aten::randn
Describe the issue:
When I tried to prune some detection models, I encountered an error that did not support aten:: randn. And this operator is usually used as the generation of the proposals, i.e., proposals = torch.randn(1000, 4).to(img.device). Thus, I think we need support for it.
My code is based on mmdetection and the latest nni.
import torch
from argparse import ArgumentParser
from mmdet.apis import inference_detector, init_detector
from nni.compression.pytorch import ModelSpeedup
from nni.compression.pytorch.utils.counter import count_flops_params
from nni.algorithms.compression.v2.pytorch.pruning.basic_pruner import SlimPruner, L1NormPruner, FPGMPruner
from nni.compression.pytorch.utils import not_safe_to_prune
device = 'cuda:0'
config = 'configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.py'
checkpoint = None
img_file = 'demo/demo.JPEG'
# build the model from a config file and a checkpoint file
model = init_detector(config, checkpoint, device=device)
model.forward = model.forward_dummy
pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
im = torch.ones(1, 3, 256, 256).to(device)
out = model(im)
torch.jit.trace(model, im, strict=False)
# with torch.no_grad():
# input_name = ['input']
# output_name = ['output']
# onnxname = 'fanet.onnx'
# torch.onnx.export(model, im, onnxname, input_names = input_name, output_names = output_name,
# opset_version=11, training=False, verbose=False, do_constant_folding=False)
# print(f'successful export onnx {onnxname}')
# exit()
# scores = model(return_loss=False, **data)
# scores = model(return_loss=False, **im)
# test a single image
# result = inference_model(model, img_file)
# Start to prune and speedupls
print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
not_safe = not_safe_to_prune(model, im)
print('\n' + '=' * 50 + 'not_safe' + '=' * 50, not_safe)
cfg_list = []
for name, module in model.named_modules():
print(name)
if name in not_safe:
continue
if isinstance(module, torch.nn.Conv2d):
cfg_list.append({'op_types':['Conv2d'], 'sparsity':0.2, 'op_names':[name]})
print('cfg_list')
for i in cfg_list:
print(i)
pruner = FPGMPruner(model, cfg_list)
_, masks = pruner.compress()
pruner.show_pruned_weights()
pruner._unwrap_model()
pruner.show_pruned_weights()
ModelSpeedup(model, dummy_input=im, masks_file=masks, confidence=32).speedup_model()
torch.jit.trace(model, im, strict=False)
print(model)
flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M')
print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M')
model.forward = model.forward_
torch.save(model, '***.pth')
And the error is:
[2022-07-15 17:24:58] simulated prune roi_head.mask_head.2.convs.0.conv remain/total: 205/256
[2022-07-15 17:24:58] simulated prune roi_head.mask_head.2.convs.1.conv remain/total: 205/256
[2022-07-15 17:24:58] simulated prune roi_head.mask_head.2.convs.2.conv remain/total: 205/256
[2022-07-15 17:24:58] simulated prune roi_head.mask_head.2.convs.3.conv remain/total: 205/256
[2022-07-15 17:24:58] simulated prune roi_head.mask_head.2.conv_logits remain/total: 64/80
[2022-07-15 17:25:00] start to speedup the model
[2022-07-15 17:25:03] infer module masks...
backbone.conv1
[2022-07-15 17:25:03] Update mask for backbone.conv1
.aten::ones.239
[2022-07-15 17:25:03] Update mask for .aten::ones.239
[2022-07-15 17:25:03] ERROR: aten::ones is not Supported! Please report an issue at https://github.com/microsoft/nni. Thanks~
backbone.bn1
[2022-07-15 17:25:03] Update mask for backbone.bn1
.aten::to.240
[2022-07-15 17:25:03] Update mask for .aten::to.240
Traceback (most recent call last):
File "fpgm_pruning.py", line 68, in <module>
ModelSpeedup(model, dummy_input=im, masks_file=masks, confidence=32).speedup_model()
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-basecv/maxin/work/roadseg/nni/compression/pytorch/speedup/compressor.py", line 537, in speedup_model
self.infer_modules_masks()
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-basecv/maxin/work/roadseg/nni/compression/pytorch/speedup/compressor.py", line 372, in infer_modules_masks
self.update_direct_sparsity(curnode)
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-basecv/maxin/work/roadseg/nni/compression/pytorch/speedup/compressor.py", line 233, in update_direct_sparsity
func, dummy_input, in_masks, in_constants=in_constants, batch_dim=self.batch_dim)
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-basecv/maxin/work/roadseg/nni/compression/pytorch/speedup/infer_mask.py", line 80, in __init__
self.output = self.module(*dummy_input)
File "/workdir/maxin/anaconda3/envs/roadseg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'x'
Environment: conda
- NNI version: the latest
- Training service (local|remote|pai|aml|etc): remote
- Client OS:
- Server OS (for remote mode only): centos 7
- Python version:
- PyTorch/TensorFlow version:
- Is conda/virtualenv/venv used?:
- Is running in Docker?:
Configuration:
- Experiment config (remember to remove secrets!):
- Search space:
Log message:
- nnimanager.log:
- dispatcher.log:
- nnictl stdout and stderr:
How to reproduce it?:
It's crashed there: [2022-07-15 17:25:03] Update mask for .aten::to.240
I think the bug is not in the operator 'aten::randn' but 'aten::to'.
It's crashed there:
[2022-07-15 17:25:03] Update mask for .aten::to.240I think the bug is not in the operator 'aten::randn' but 'aten::to'.
Yes, I modify forward_dummy and suffer the same problem. I think this problem is caused by proposals = proposals.to(img.device).
def forward_dummy(self, img, proposals = torch.randn(1000, 4)):
"""Used for computing network flops.
See `mmdetection/tools/analysis_tools/get_flops.py`
"""
outs = ()
# backbone
x = self.extract_feat(img)
# rpn
if self.with_rpn:
rpn_outs = self.rpn_head(x)
outs = outs + (rpn_outs, )
# proposals = torch.randn(1000, 4).to(img.device)
# proposals = torch.ones(1000, 4).to(img.device) # for pruning
proposals = proposals.to(img.device)
# roi_head
roi_outs = self.roi_head.forward_dummy(x, proposals)
outs = outs + (roi_outs, )
return outs
The error is:
[2022-07-18 10:43:44] Update mask for .aten::to.239
Traceback (most recent call last):
File "fpgm_pruning.py", line 69, in <module>
ModelSpeedup(model, dummy_input=im, masks_file=masks, confidence=32).speedup_model()
File "nni/compression/pytorch/speedup/compressor.py", line 537, in speedup_model
self.infer_modules_masks()
File "nni/compression/pytorch/speedup/compressor.py", line 372, in infer_modules_masks
self.update_direct_sparsity(curnode)
File "nni/compression/pytorch/speedup/compressor.py", line 233, in update_direct_sparsity
func, dummy_input, in_masks, in_constants=in_constants, batch_dim=self.batch_dim)
File "nni/compression/pytorch/speedup/infer_mask.py", line 81, in __init__
self.output = self.module(*dummy_input)
File "anaconda3/envs/roadseg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'x'
#4945
#4945
I use auto-speed feature to prune this detection model and suffer the following error:
[2022-07-19 19:24:13] Update mask for backbone.layer1.0.conv1
[2022-07-19 19:24:13] Update mask for backbone.layer1.0.downsample.0
[2022-07-19 19:24:13] Update mask for .aten::new_full.251
Traceback (most recent call last):
File "fpgm_pruning.py", line 77, in <module>
ModelSpeedup(model, dummy_input=inputs_combined, masks_file=masks, confidence=16).speedup_model()
File "/nni/compression/pytorch/speedup/compressor.py", line 536, in speedup_model
self.infer_modules_masks()
File "/nni/compression/pytorch/speedup/compressor.py", line 371, in infer_modules_masks
self.update_direct_sparsity(curnode)
File "/nni/compression/pytorch/speedup/compressor.py", line 225, in update_direct_sparsity
func = jit_to_python_function(node, self)
File "/nni/compression/pytorch/speedup/jit_translate.py", line 473, in jit_to_python_function
return trans_func_dict[node.op_type](node, speedup)
File "/nni/compression/pytorch/speedup/jit_translate.py", line 413, in generate_aten_to_python
for f in fs: keyword[p] = f(keyword[p])
File "/nni/compression/pytorch/speedup/jit_translate.py", line 279, in dtype_trans
raise TypeError("Unimplemented scalar type")
TypeError: Unimplemented scalar type
#4945
I use
auto-speedfeature to prune this detection model and suffer the following error:[2022-07-19 19:24:13] Update mask for backbone.layer1.0.conv1 [2022-07-19 19:24:13] Update mask for backbone.layer1.0.downsample.0 [2022-07-19 19:24:13] Update mask for .aten::new_full.251 Traceback (most recent call last): File "fpgm_pruning.py", line 77, in <module> ModelSpeedup(model, dummy_input=inputs_combined, masks_file=masks, confidence=16).speedup_model() File "/nni/compression/pytorch/speedup/compressor.py", line 536, in speedup_model self.infer_modules_masks() File "/nni/compression/pytorch/speedup/compressor.py", line 371, in infer_modules_masks self.update_direct_sparsity(curnode) File "/nni/compression/pytorch/speedup/compressor.py", line 225, in update_direct_sparsity func = jit_to_python_function(node, self) File "/nni/compression/pytorch/speedup/jit_translate.py", line 473, in jit_to_python_function return trans_func_dict[node.op_type](node, speedup) File "/nni/compression/pytorch/speedup/jit_translate.py", line 413, in generate_aten_to_python for f in fs: keyword[p] = f(keyword[p]) File "/nni/compression/pytorch/speedup/jit_translate.py", line 279, in dtype_trans raise TypeError("Unimplemented scalar type") TypeError: Unimplemented scalar type
def dtype_trans(ivalue: int | torch.dtype):
"""
Special process for dtype.
Torch will transform dtype to an enum in cpp, so the value of dtype we get in jit is an int.
This function is used to recover the int to torch.dtype in python.
Parameters
----------
ivalue:
The value of dtype or method to be recovered.
"""
print(ivalue)
if ivalue is None or type(ivalue) is torch.dtype:
return ivalue
elif type(ivalue) is int:
global enum2dtype_dict
if ivalue not in enum2dtype_dict:
raise TypeError("Unimplemented scalar type")
return enum2dtype_dict[ivalue]
else:
raise TypeError("Unimplemented scalar type")
I print the ivalue, and it's False. And it is not supported now.
please offer the file 'cascade_mask_rcnn_r50_fpn_1x_coco.py'
please offer the file 'cascade_mask_rcnn_r50_fpn_1x_coco.py'
You can find this file in this repository (https://github.com/XinMa-AI/detection, configs/cascade_rcnn).
Sorry for not replying these days, Now in 2.9 version we submitted the feature to automatically using the ops in aten:: namespace. The 'randn_like', 'rand_like' and 'ones_like' are supported now. But 'randn', 'rand' or 'ones' still cannot be executed correctly.
You can use 'randn_like' now to avoid the bug now. And we are trying to fix this problem in #5017.