mmdeploy icon indicating copy to clipboard operation
mmdeploy copied to clipboard

[Bug] DeformConv2dFunction is not exportable to ONNX IR when padding is int

Open jakubhejhal opened this issue 1 year ago • 0 comments

Checklist

  • [X] I have searched related issues but cannot get the expected help.
  • [X] 2. I have read the FAQ documentation but cannot get the expected help.
  • [X] 3. The bug has not been fixed in the latest version.

Describe the bug

The type of padding of the DeformConv2dFunction is defined to be Union[int, Tuple[int, ...]] https://github.com/open-mmlab/mmcv/blob/d9e10e11846d911e8354cd024967d3a17a88083c/mmcv/ops/deform_conv.py#L77

But the symbolic rewriter expects the padding to be a pair (works only with the Tuple). https://github.com/open-mmlab/mmdeploy/blob/bc75c9d6c8940aa03d0e1e5b5962bd930478ba77/mmdeploy/mmcv/ops/deform_conv.py#L25

As a results, when trying to export the tood model, I'm getting the following exception:

...
qi-inference-module  |   File "/venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1708, in _run_symbolic_method
qi-inference-module  |     return symbolic_fn(graph_context, *args)
qi-inference-module  |   File "/venv/lib/python3.10/site-packages/mmdeploy/mmcv/ops/deform_conv.py", line 25, in deform_conv__default
qi-inference-module  |     padding_i=[p for pair in zip(padding, padding) for p in pair],
qi-inference-module  | TypeError: 'int' object is not iterable (occurred when translating DeformConv2dFunction)
qi-inference-module  | 02/15 06:58:11 - mmengine - ERROR - /venv/lib/python3.10/site-packages/mmdeploy/apis/core/pipeline_manager.py - pop_mp_output - 80 - `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.

Because the deform_conv2d is called with int padding parameter. https://github.com/open-mmlab/mmdetection/blob/cfd5d3a985b0249de009b67d04f37263e11cdf3d/mmdet/models/dense_heads/tood_head.py#L313

Reproduction

We integrate mmdeploy and mmdetection in our proprietary codebase, so I cannot post any code here. We just call the torch2onnx from mmdeploy.apis.

Environment

The environment is proprietary.

Error traceback

02/15 06:58:03 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
 02/15 06:58:03 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
 Loads checkpoint by local backend from path: /tmp/tmp8ong2sd5/mmdet_checkpoint.pth
 02/15 06:58:05 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 
 02/15 06:58:05 - mmengine - INFO - Export PyTorch model to ONNX: /app/models/tood/end2end.onnx.
 02/15 06:58:05 - mmengine - WARNING - Can not find torch.nn.functional._scaled_dot_product_attention, function rewrite will not be applied
 02/15 06:58:05 - mmengine - WARNING - Can not find mmdet.models.utils.transformer.PatchMerging.forward, function rewrite will not be applied
 /venv/lib/python3.10/site-packages/mmdeploy/codebase/mmdet/models/detectors/single_stage.py:80: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
   img_shape = [int(val) for val in img_shape]
 /venv/lib/python3.10/site-packages/mmdeploy/codebase/mmdet/models/detectors/single_stage.py:80: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   img_shape = [int(val) for val in img_shape]
 /venv/lib/python3.10/site-packages/mmdeploy/core/optimizers/function_marker.py:160: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   ys_shape = tuple(int(s) for s in ys.shape)
 /venv/lib/python3.10/site-packages/mmcv/ops/deform_conv.py:218: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   if not all(map(lambda s: s > 0, output_size)):
 /venv/lib/python3.10/site-packages/mmcv/ops/deform_conv.py:114: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   int(i)
 /venv/lib/python3.10/site-packages/mmcv/ops/deform_conv.py:120: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   cur_im2col_step = min(ctx.im2col_step, input.size(0))
 /venv/lib/python3.10/site-packages/mmcv/ops/deform_conv.py:121: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert (input.size(0) % cur_im2col_step
 /venv/lib/python3.10/site-packages/mmdeploy/codebase/mmdet/models/dense_heads/base_dense_head.py:109: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert cls_score.size()[-2:] == bbox_pred.size()[-2:]
 /venv/lib/python3.10/site-packages/mmdeploy/pytorch/functions/topk.py:58: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   if k > size:
 /venv/lib/python3.10/site-packages/mmdeploy/codebase/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py:38: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert pred_bboxes.size(0) == bboxes.size(0)
 /venv/lib/python3.10/site-packages/mmdeploy/codebase/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py:40: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert pred_bboxes.size(1) == bboxes.size(1)
 /venv/lib/python3.10/site-packages/mmdeploy/mmcv/ops/nms.py:474: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   int(scores.shape[-1]),
 /venv/lib/python3.10/site-packages/mmdeploy/mmcv/ops/nms.py:148: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   out_boxes = min(num_boxes, after_topk)
 [W shape_type_inference.cpp:1920] Warning: The shape inference of mmdeploy::GridPriorsTRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
 ============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
 verbose: False, log level: Level.ERROR
 ======================= 0 NONE 0 NOTE 1 WARNING 0 ERROR ========================
 1 WARNING were not printed due to the log level.
 
 Process Process-1:2:
 Traceback (most recent call last):
   File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
     self.run()
   File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
     self._target(*self._args, **self._kwargs)
   File "/venv/lib/python3.10/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
     ret = func(*args, **kwargs)
   File "/venv/lib/python3.10/site-packages/mmdeploy/apis/pytorch2onnx.py", line 98, in torch2onnx
     export(
   File "/venv/lib/python3.10/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
     return self.call_function(func_name_, *args, **kwargs)
   File "/venv/lib/python3.10/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
     return self.call_function_local(func_name, *args, **kwargs)
   File "/venv/lib/python3.10/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
     return pipe_caller(*args, **kwargs)
   File "/venv/lib/python3.10/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
     ret = func(*args, **kwargs)
   File "/venv/lib/python3.10/site-packages/mmdeploy/apis/onnx/export.py", line 138, in export
     torch.onnx.export(
   File "/venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 506, in export
     _export(
   File "/venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1548, in _export
     graph, params_dict, torch_out = _model_to_graph(
   File "/venv/lib/python3.10/site-packages/mmdeploy/apis/onnx/optimizer.py", line 27, in model_to_graph__custom_optimizer
     graph, params_dict, torch_out = ctx.origin_func(*args, **kwargs)
   File "/venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1117, in _model_to_graph
     graph = _optimize_graph(
   File "/venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 665, in _optimize_graph
     graph = _C._jit_pass_onnx(graph, operator_export_type)
   File "/venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1708, in _run_symbolic_method
     return symbolic_fn(graph_context, *args)
   File "/venv/lib/python3.10/site-packages/mmdeploy/mmcv/ops/deform_conv.py", line 25, in deform_conv__default
     padding_i=[p for pair in zip(padding, padding) for p in pair],
 TypeError: 'int' object is not iterable (occurred when translating DeformConv2dFunction)
 02/15 06:58:11 - mmengine - ERROR - /venv/lib/python3.10/site-packages/mmdeploy/apis/core/pipeline_manager.py - pop_mp_output - 80 - `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.

jakubhejhal avatar Feb 19 '24 15:02 jakubhejhal