mmdeploy RuntimeError: CUDA error: invalid configuration argument

i'd like deploy the mmrorate model to jetpack device.

refet to https://github.com/open-mmlab/mmdeploy/blob/master/docs/en/01-how-to-build/jetsons.md,i nearly install the env on jetpack 4.6.1,but give up.(the mmrotate only support py3.7+).

i flash the system to jetpack 5.0. with some env error and fix, i finish build the env. and i test mmdetection demo.it 's no prroblem.

i turn to mmrotate, and build the mmrotate with the readme.md. i refer the https://github.com/open-mmlab/mmdeploy/blob/master/docs/en/04-supported-codebases/mmrotate.md to deploy.but the following error:

python tools/deploy.py configs/mmrotate/rotated-detection_onnxruntime_dynamic.py $MMROTATE_DIR/configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le135.py $MMROTATE_DIR/checkpoints/rotated_retinanet_obb_r50_fpn_1x_dota_le135-e4131166.pth $MMROTATE_DIR/demo/demo.jpg --work-dir work-dirs/mmrotate/rotated_retinanet/ort --show --device cuda:0

[2022-08-12 16:19:20.136] [mmdeploy] [info] [model.cpp:95] Register 'DirectoryModel' [2022-08-12 16:19:27.949] [mmdeploy] [info] [model.cpp:95] Register 'DirectoryModel' [2022-08-12 16:19:35.684] [mmdeploy] [info] [model.cpp:95] Register 'DirectoryModel' 2022-08-12 16:19:35,711 - mmdeploy - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess load checkpoint from local path: /home/me/project/mm-env/mmrotate/checkpoints/rotated_retinanet_obb_r50_fpn_1x_dota_le135-e4131166.pth 2022-08-12 16:19:52,400 - mmdeploy - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 2022-08-12 16:19:52,401 - mmdeploy - INFO - Export PyTorch model to ONNX: work-dirs/mmrotate/rotated_retinanet/ort/end2end.onnx. /home/me/project/mm-env/mmdeploy/mmdeploy/core/optimizers/function_marker.py:158: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! ys_shape = tuple(int(s) for s in ys.shape) /home/me/project/mm-env/mmdeploy/mmdeploy/pytorch/functions/topk.py:28: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. k = torch.tensor(k, device=input.device, dtype=torch.long) /home/me/project/mm-env/mmrotate/mmrotate/core/bbox/coder/delta_xywha_rbbox_coder.py:101: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert pred_bboxes.size(0) == bboxes.size(0) Process Process-2: Traceback (most recent call last): File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in call ret = func(*args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/pytorch2onnx.py", line 92, in torch2onnx export( File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 356, in wrap return self.call_function(func_name, *args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function return self.call_function_local(func_name, *args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local return pipe_caller(*args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in call ret = func(*args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/onnx/export.py", line 122, in export torch.onnx.export( File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/init.py", line 319, in export return utils.export(model, args, f, export_params, verbose, training, File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/utils.py", line 113, in export _export(model, args, f, export_params, verbose, training, input_names, output_names, File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/utils.py", line 716, in _export _model_to_graph(model, args, verbose, input_names, File "/home/me/project/mm-env/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 379, in wrapper return self.func(self, *args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/onnx/optimizer.py", line 10, in model_to_graph__custom_optimizer graph, params_dict, torch_out = ctx.origin_func(*args, **kwargs) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/utils.py", line 496, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/utils.py", line 437, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/utils.py", line 388, in _trace_and_get_graph_from_model torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/jit/_trace.py", line 1166, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1111, in _call_impl return forward_call(*input, **kwargs) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/jit/_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/jit/_trace.py", line 118, in wrapper outs.append(self.inner(*trace_inputs)) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1111, in _call_impl return forward_call(*input, **kwargs) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1099, in _slow_forward result = self.forward(*input, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 379, in wrapper return self.func(self, *args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/base.py", line 70, in base_detector__forward return __forward_impl(ctx, self, img, img_metas=img_metas, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/core/optimizers/function_marker.py", line 261, in g rets = f(*args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/base.py", line 26, in __forward_impl return self.simple_test(img, img_metas, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 379, in wrapper return self.func(self, *args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/codebase/mmrotate/models/single_stage_rotated_detector.py", line 32, in single_stage_rotated_detector__simple_test outs = self.bbox_head.get_bboxes(*outs, img_metas, rescale=rescale) File "/home/me/project/mm-env/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 379, in wrapper return self.func(self, *args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/codebase/mmrotate/models/dense_heads/rotated_anchor_head.py", line 132, in rotated_anchor_head__get_bbox return multiclass_nms_rotated( File "/home/me/project/mm-env/mmdeploy/mmdeploy/core/optimizers/function_marker.py", line 261, in g rets = f(*args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/codebase/mmrotate/core/post_processing/bbox_nms.py", line 178, in multiclass_nms_rotated return mmdeploy.codebase.mmrotate.core.post_processing.bbox_nms.
File "/home/me/project/mm-env/mmdeploy/mmdeploy/codebase/mmrotate/core/post_processing/bbox_nms.py", line 118, in _multiclass_nms_rotated selected_indices = ONNXNMSRotatedOp.apply(boxes, scores, iou_threshold, File "/home/me/project/mm-env/mmdeploy/mmdeploy/mmcv/ops/nms_rotated.py", line 42, in forward box_inds = ext_module.nms_rotated(_boxes, _scores, order, RuntimeError: CUDA error: invalid configuration argument CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 2022-08-12 16:20:01,249 - mmdeploy - ERROR - mmdeploy.apis.pytorch2onnx.torch2onnx with Call id: 0 failed. exit.

Aug 12 '22 09:08 Im-JimmyHu

Normally that means something wrong with the configuration of the CUDA kernel. Please add some logs in https://github.com/open-mmlab/mmdeploy/blob/670a5045022d4c541b0b61027a4a975d8b18da01/csrc/mmdeploy/backend_ops/tensorrt/common_impl/nms/allClassRotatedNMS.cu#L434 to see the value of GS, BS and t_size. Also note that jetson device might have less resources than server device, so it would be a good idea to limit the boxes number to the NMS.

Aug 12 '22 09:08 grimoire

@grimoire following your advice , i add a line printf("\nGS:%d,BS:%d,t_size_%d\n",GS,BS,t_size);on the line 433.

to print GS, BS and t_size. but the log doesn't show the information.

i get the val through caculate and https://github.com/open-mmlab/mmdeploy/blob/670a5045022d4c541b0b61027a4a975d8b18da01/csrc/mmdeploy/backend_ops/tensorrt/common_impl/nms/allClassRotatedNMS.cu#L429

BS = 512 GS=15

besides ,even i turn down the BS from 512 to 100, it no help.

Aug 12 '22 10:08 Im-JimmyHu

what about t_size?

Aug 15 '22 02:08 grimoire

what about t_size?

because the log doesn't print the value, i can't get the value of t_size. and in the function , i can't get the value of top_k so that to get the value of t_size through t_size = (top_k + BS - 1) / BS;

Aug 15 '22 05:08 Im-JimmyHu

I got the same problem while using mmrotate/configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota_le90.py. Have you solved it?

Sep 10 '22 06:09 suay1113

@Im-JimmyHu Sorry for the late reply. I notice that you are using rotated-detection_onnxruntime_dynamic.py to convert your model. It is recommend to use TensorRT on jetson device, please have a try.

@suay1113 Could you provide more detail about how do you convert your model and you environments?

Sep 13 '22 08:09 grimoire

mmdeploy mmdeploy copied to clipboard

RuntimeError: CUDA error: invalid configuration argument

mmdeploy
mmdeploy copied to clipboard