nni icon indicating copy to clipboard operation
nni copied to clipboard

Support for aten::randn

Open maxin-cn opened this issue 3 years ago • 8 comments

Describe the issue:

When I tried to prune some detection models, I encountered an error that did not support aten:: randn. And this operator is usually used as the generation of the proposals, i.e., proposals = torch.randn(1000, 4).to(img.device). Thus, I think we need support for it.

My code is based on mmdetection and the latest nni.

import torch
from argparse import ArgumentParser

from mmdet.apis import inference_detector, init_detector

from nni.compression.pytorch import ModelSpeedup
from nni.compression.pytorch.utils.counter import count_flops_params
from nni.algorithms.compression.v2.pytorch.pruning.basic_pruner import SlimPruner, L1NormPruner, FPGMPruner
from nni.compression.pytorch.utils import not_safe_to_prune

device = 'cuda:0'
config = 'configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.py'
checkpoint = None
img_file = 'demo/demo.JPEG'

# build the model from a config file and a checkpoint file
model = init_detector(config, checkpoint, device=device)

model.forward = model.forward_dummy

pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))

im = torch.ones(1, 3, 256, 256).to(device)
out = model(im)
torch.jit.trace(model, im, strict=False)

# with torch.no_grad():
#     input_name = ['input']
#     output_name  = ['output']
#     onnxname = 'fanet.onnx'
#     torch.onnx.export(model, im, onnxname, input_names = input_name, output_names = output_name,
#                     opset_version=11, training=False, verbose=False, do_constant_folding=False)
#     print(f'successful export onnx {onnxname}')
# exit()

# scores = model(return_loss=False, **data)
# scores = model(return_loss=False, **im)

# test a single image
# result = inference_model(model, img_file)

# Start to prune and speedupls
print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
not_safe = not_safe_to_prune(model, im)



print('\n' + '=' * 50 +  'not_safe' + '=' * 50, not_safe)
cfg_list = []
for name, module in model.named_modules():
    print(name)
    if name in not_safe:
        continue
    if isinstance(module, torch.nn.Conv2d):
        cfg_list.append({'op_types':['Conv2d'], 'sparsity':0.2, 'op_names':[name]})

print('cfg_list')
for i in cfg_list:
    print(i)

pruner = FPGMPruner(model, cfg_list)
_, masks = pruner.compress()
pruner.show_pruned_weights()
pruner._unwrap_model()
pruner.show_pruned_weights()

ModelSpeedup(model, dummy_input=im, masks_file=masks, confidence=32).speedup_model()
torch.jit.trace(model, im, strict=False)
print(model)
flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M')
print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M')
model.forward = model.forward_
torch.save(model, '***.pth')

And the error is:

[2022-07-15 17:24:58] simulated prune roi_head.mask_head.2.convs.0.conv remain/total: 205/256
[2022-07-15 17:24:58] simulated prune roi_head.mask_head.2.convs.1.conv remain/total: 205/256
[2022-07-15 17:24:58] simulated prune roi_head.mask_head.2.convs.2.conv remain/total: 205/256
[2022-07-15 17:24:58] simulated prune roi_head.mask_head.2.convs.3.conv remain/total: 205/256
[2022-07-15 17:24:58] simulated prune roi_head.mask_head.2.conv_logits remain/total: 64/80
[2022-07-15 17:25:00] start to speedup the model
[2022-07-15 17:25:03] infer module masks...
backbone.conv1
[2022-07-15 17:25:03] Update mask for backbone.conv1
.aten::ones.239
[2022-07-15 17:25:03] Update mask for .aten::ones.239
[2022-07-15 17:25:03] ERROR: aten::ones is not Supported! Please report an issue at https://github.com/microsoft/nni. Thanks~
backbone.bn1
[2022-07-15 17:25:03] Update mask for backbone.bn1
.aten::to.240
[2022-07-15 17:25:03] Update mask for .aten::to.240
Traceback (most recent call last):
  File "fpgm_pruning.py", line 68, in <module>
    ModelSpeedup(model, dummy_input=im, masks_file=masks, confidence=32).speedup_model()
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-basecv/maxin/work/roadseg/nni/compression/pytorch/speedup/compressor.py", line 537, in speedup_model
    self.infer_modules_masks()
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-basecv/maxin/work/roadseg/nni/compression/pytorch/speedup/compressor.py", line 372, in infer_modules_masks
    self.update_direct_sparsity(curnode)
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-basecv/maxin/work/roadseg/nni/compression/pytorch/speedup/compressor.py", line 233, in update_direct_sparsity
    func, dummy_input, in_masks, in_constants=in_constants, batch_dim=self.batch_dim)
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-basecv/maxin/work/roadseg/nni/compression/pytorch/speedup/infer_mask.py", line 80, in __init__
    self.output = self.module(*dummy_input)
  File "/workdir/maxin/anaconda3/envs/roadseg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'x'

Environment: conda

  • NNI version: the latest
  • Training service (local|remote|pai|aml|etc): remote
  • Client OS:
  • Server OS (for remote mode only): centos 7
  • Python version:
  • PyTorch/TensorFlow version:
  • Is conda/virtualenv/venv used?:
  • Is running in Docker?:

Configuration:

  • Experiment config (remember to remove secrets!):
  • Search space:

Log message:

  • nnimanager.log:
  • dispatcher.log:
  • nnictl stdout and stderr:

How to reproduce it?:

maxin-cn avatar Jul 15 '22 09:07 maxin-cn

It's crashed there: [2022-07-15 17:25:03] Update mask for .aten::to.240 I think the bug is not in the operator 'aten::randn' but 'aten::to'.

Louis-J avatar Jul 18 '22 02:07 Louis-J

It's crashed there: [2022-07-15 17:25:03] Update mask for .aten::to.240 I think the bug is not in the operator 'aten::randn' but 'aten::to'.

Yes, I modify forward_dummy and suffer the same problem. I think this problem is caused by proposals = proposals.to(img.device).

def forward_dummy(self, img, proposals = torch.randn(1000, 4)):
        """Used for computing network flops.

        See `mmdetection/tools/analysis_tools/get_flops.py`
        """
        outs = ()
        # backbone
        x = self.extract_feat(img)
        # rpn
        if self.with_rpn:
            rpn_outs = self.rpn_head(x)
            outs = outs + (rpn_outs, )
        # proposals = torch.randn(1000, 4).to(img.device)
        # proposals = torch.ones(1000, 4).to(img.device) # for pruning
        proposals = proposals.to(img.device)
        # roi_head
        roi_outs = self.roi_head.forward_dummy(x, proposals)
        outs = outs + (roi_outs, )
        return outs

The error is:

[2022-07-18 10:43:44] Update mask for .aten::to.239
Traceback (most recent call last):
  File "fpgm_pruning.py", line 69, in <module>
    ModelSpeedup(model, dummy_input=im, masks_file=masks, confidence=32).speedup_model()
  File "nni/compression/pytorch/speedup/compressor.py", line 537, in speedup_model
    self.infer_modules_masks()
  File "nni/compression/pytorch/speedup/compressor.py", line 372, in infer_modules_masks
    self.update_direct_sparsity(curnode)
  File "nni/compression/pytorch/speedup/compressor.py", line 233, in update_direct_sparsity
    func, dummy_input, in_masks, in_constants=in_constants, batch_dim=self.batch_dim)
  File "nni/compression/pytorch/speedup/infer_mask.py", line 81, in __init__
    self.output = self.module(*dummy_input)
  File "anaconda3/envs/roadseg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'x'

maxin-cn avatar Jul 18 '22 02:07 maxin-cn

#4945

Louis-J avatar Jul 19 '22 11:07 Louis-J

#4945

I use auto-speed feature to prune this detection model and suffer the following error:

[2022-07-19 19:24:13] Update mask for backbone.layer1.0.conv1
[2022-07-19 19:24:13] Update mask for backbone.layer1.0.downsample.0
[2022-07-19 19:24:13] Update mask for .aten::new_full.251
Traceback (most recent call last):
  File "fpgm_pruning.py", line 77, in <module>
    ModelSpeedup(model, dummy_input=inputs_combined, masks_file=masks, confidence=16).speedup_model()
  File "/nni/compression/pytorch/speedup/compressor.py", line 536, in speedup_model
    self.infer_modules_masks()
  File "/nni/compression/pytorch/speedup/compressor.py", line 371, in infer_modules_masks
    self.update_direct_sparsity(curnode)
  File "/nni/compression/pytorch/speedup/compressor.py", line 225, in update_direct_sparsity
    func = jit_to_python_function(node, self)
  File "/nni/compression/pytorch/speedup/jit_translate.py", line 473, in jit_to_python_function
    return trans_func_dict[node.op_type](node, speedup)
  File "/nni/compression/pytorch/speedup/jit_translate.py", line 413, in generate_aten_to_python
    for f in fs: keyword[p] = f(keyword[p])
  File "/nni/compression/pytorch/speedup/jit_translate.py", line 279, in dtype_trans
    raise TypeError("Unimplemented scalar type")
TypeError: Unimplemented scalar type

maxin-cn avatar Jul 19 '22 11:07 maxin-cn

#4945

I use auto-speed feature to prune this detection model and suffer the following error:

[2022-07-19 19:24:13] Update mask for backbone.layer1.0.conv1
[2022-07-19 19:24:13] Update mask for backbone.layer1.0.downsample.0
[2022-07-19 19:24:13] Update mask for .aten::new_full.251
Traceback (most recent call last):
  File "fpgm_pruning.py", line 77, in <module>
    ModelSpeedup(model, dummy_input=inputs_combined, masks_file=masks, confidence=16).speedup_model()
  File "/nni/compression/pytorch/speedup/compressor.py", line 536, in speedup_model
    self.infer_modules_masks()
  File "/nni/compression/pytorch/speedup/compressor.py", line 371, in infer_modules_masks
    self.update_direct_sparsity(curnode)
  File "/nni/compression/pytorch/speedup/compressor.py", line 225, in update_direct_sparsity
    func = jit_to_python_function(node, self)
  File "/nni/compression/pytorch/speedup/jit_translate.py", line 473, in jit_to_python_function
    return trans_func_dict[node.op_type](node, speedup)
  File "/nni/compression/pytorch/speedup/jit_translate.py", line 413, in generate_aten_to_python
    for f in fs: keyword[p] = f(keyword[p])
  File "/nni/compression/pytorch/speedup/jit_translate.py", line 279, in dtype_trans
    raise TypeError("Unimplemented scalar type")
TypeError: Unimplemented scalar type
def dtype_trans(ivalue: int | torch.dtype):
    """
    Special process for dtype.
    Torch will transform dtype to an enum in cpp, so the value of dtype we get in jit is an int.
    This function is used to recover the int to torch.dtype in python.

    Parameters
    ----------
    ivalue:
        The value of dtype or method to be recovered.

    """
    print(ivalue)
    if ivalue is None or type(ivalue) is torch.dtype:
        return ivalue
    elif type(ivalue) is int:
        global enum2dtype_dict
        if ivalue not in enum2dtype_dict:
            raise TypeError("Unimplemented scalar type")
        return enum2dtype_dict[ivalue]
    else:
        raise TypeError("Unimplemented scalar type")

I print the ivalue, and it's False. And it is not supported now.

maxin-cn avatar Jul 20 '22 07:07 maxin-cn

please offer the file 'cascade_mask_rcnn_r50_fpn_1x_coco.py'

Louis-J avatar Jul 22 '22 04:07 Louis-J

please offer the file 'cascade_mask_rcnn_r50_fpn_1x_coco.py'

You can find this file in this repository (https://github.com/XinMa-AI/detection, configs/cascade_rcnn).

maxin-cn avatar Jul 25 '22 07:07 maxin-cn

Sorry for not replying these days, Now in 2.9 version we submitted the feature to automatically using the ops in aten:: namespace. The 'randn_like', 'rand_like' and 'ones_like' are supported now. But 'randn', 'rand' or 'ones' still cannot be executed correctly.

You can use 'randn_like' now to avoid the bug now. And we are trying to fix this problem in #5017.

Louis-J avatar Sep 09 '22 06:09 Louis-J