mmdeploy
mmdeploy copied to clipboard
RuntimeError: CUDA error: invalid configuration argument
i'd like deploy the mmrorate model to jetpack device.
refet to https://github.com/open-mmlab/mmdeploy/blob/master/docs/en/01-how-to-build/jetsons.md,i nearly install the env on jetpack 4.6.1,but give up.(the mmrotate only support py3.7+).
i flash the system to jetpack 5.0. with some env error and fix, i finish build the env. and i test mmdetection demo.it 's no prroblem.
i turn to mmrotate, and build the mmrotate with the readme.md. i refer the https://github.com/open-mmlab/mmdeploy/blob/master/docs/en/04-supported-codebases/mmrotate.md to deploy.but the following error:
python tools/deploy.py configs/mmrotate/rotated-detection_onnxruntime_dynamic.py $MMROTATE_DIR/configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le135.py $MMROTATE_DIR/checkpoints/rotated_retinanet_obb_r50_fpn_1x_dota_le135-e4131166.pth $MMROTATE_DIR/demo/demo.jpg --work-dir work-dirs/mmrotate/rotated_retinanet/ort --show --device cuda:0
[2022-08-12 16:19:20.136] [mmdeploy] [info] [model.cpp:95] Register 'DirectoryModel' [2022-08-12 16:19:27.949] [mmdeploy] [info] [model.cpp:95] Register 'DirectoryModel' [2022-08-12 16:19:35.684] [mmdeploy] [info] [model.cpp:95] Register 'DirectoryModel' 2022-08-12 16:19:35,711 - mmdeploy - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess load checkpoint from local path: /home/me/project/mm-env/mmrotate/checkpoints/rotated_retinanet_obb_r50_fpn_1x_dota_le135-e4131166.pth 2022-08-12 16:19:52,400 - mmdeploy - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 2022-08-12 16:19:52,401 - mmdeploy - INFO - Export PyTorch model to ONNX: work-dirs/mmrotate/rotated_retinanet/ort/end2end.onnx. /home/me/project/mm-env/mmdeploy/mmdeploy/core/optimizers/function_marker.py:158: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! ys_shape = tuple(int(s) for s in ys.shape) /home/me/project/mm-env/mmdeploy/mmdeploy/pytorch/functions/topk.py:28: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. k = torch.tensor(k, device=input.device, dtype=torch.long) /home/me/project/mm-env/mmrotate/mmrotate/core/bbox/coder/delta_xywha_rbbox_coder.py:101: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert pred_bboxes.size(0) == bboxes.size(0) Process Process-2: Traceback (most recent call last): File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in call ret = func(*args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/pytorch2onnx.py", line 92, in torch2onnx export( File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 356, in wrap return self.call_function(func_name, *args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function return self.call_function_local(func_name, *args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local return pipe_caller(*args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in call ret = func(*args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/onnx/export.py", line 122, in export torch.onnx.export( File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/init.py", line 319, in export return utils.export(model, args, f, export_params, verbose, training, File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/utils.py", line 113, in export _export(model, args, f, export_params, verbose, training, input_names, output_names, File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/utils.py", line 716, in _export _model_to_graph(model, args, verbose, input_names, File "/home/me/project/mm-env/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 379, in wrapper return self.func(self, *args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/apis/onnx/optimizer.py", line 10, in model_to_graph__custom_optimizer graph, params_dict, torch_out = ctx.origin_func(*args, **kwargs) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/utils.py", line 496, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/utils.py", line 437, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/utils.py", line 388, in _trace_and_get_graph_from_model torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/jit/_trace.py", line 1166, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1111, in _call_impl return forward_call(*input, **kwargs) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/jit/_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/jit/_trace.py", line 118, in wrapper outs.append(self.inner(*trace_inputs)) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1111, in _call_impl return forward_call(*input, **kwargs) File "/home/me/archiconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1099, in _slow_forward result = self.forward(*input, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 379, in wrapper return self.func(self, *args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/base.py", line 70, in base_detector__forward return __forward_impl(ctx, self, img, img_metas=img_metas, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/core/optimizers/function_marker.py", line 261, in g rets = f(*args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/base.py", line 26, in __forward_impl return self.simple_test(img, img_metas, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 379, in wrapper return self.func(self, *args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/codebase/mmrotate/models/single_stage_rotated_detector.py", line 32, in single_stage_rotated_detector__simple_test outs = self.bbox_head.get_bboxes(*outs, img_metas, rescale=rescale) File "/home/me/project/mm-env/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 379, in wrapper return self.func(self, *args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/codebase/mmrotate/models/dense_heads/rotated_anchor_head.py", line 132, in rotated_anchor_head__get_bbox return multiclass_nms_rotated( File "/home/me/project/mm-env/mmdeploy/mmdeploy/core/optimizers/function_marker.py", line 261, in g rets = f(*args, **kwargs) File "/home/me/project/mm-env/mmdeploy/mmdeploy/codebase/mmrotate/core/post_processing/bbox_nms.py", line 178, in multiclass_nms_rotated return mmdeploy.codebase.mmrotate.core.post_processing.bbox_nms.
File "/home/me/project/mm-env/mmdeploy/mmdeploy/codebase/mmrotate/core/post_processing/bbox_nms.py", line 118, in _multiclass_nms_rotated selected_indices = ONNXNMSRotatedOp.apply(boxes, scores, iou_threshold, File "/home/me/project/mm-env/mmdeploy/mmdeploy/mmcv/ops/nms_rotated.py", line 42, in forward box_inds = ext_module.nms_rotated(_boxes, _scores, order, RuntimeError: CUDA error: invalid configuration argument CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 2022-08-12 16:20:01,249 - mmdeploy - ERROR -mmdeploy.apis.pytorch2onnx.torch2onnx
with Call id: 0 failed. exit.
Normally that means something wrong with the configuration of the CUDA kernel. Please add some logs in https://github.com/open-mmlab/mmdeploy/blob/670a5045022d4c541b0b61027a4a975d8b18da01/csrc/mmdeploy/backend_ops/tensorrt/common_impl/nms/allClassRotatedNMS.cu#L434 to see the value of GS
, BS
and t_size
.
Also note that jetson device might have less resources than server device, so it would be a good idea to limit the boxes number to the NMS.
@grimoire following your advice , i add a line printf("\nGS:%d,BS:%d,t_size_%d\n",GS,BS,t_size);
on the line 433.
to print GS, BS and t_size. but the log doesn't show the information.
i get the val through caculate and https://github.com/open-mmlab/mmdeploy/blob/670a5045022d4c541b0b61027a4a975d8b18da01/csrc/mmdeploy/backend_ops/tensorrt/common_impl/nms/allClassRotatedNMS.cu#L429
BS = 512 GS=15
besides ,even i turn down the BS from 512 to 100, it no help.
what about t_size
?
what about
t_size
?
because the log doesn't print the value, i can't get the value of t_size. and in the function , i can't get the value of top_k so that to get the value of t_size through t_size = (top_k + BS - 1) / BS;
I got the same problem while using mmrotate/configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota_le90.py
.
Have you solved it?
@Im-JimmyHu
Sorry for the late reply.
I notice that you are using rotated-detection_onnxruntime_dynamic.py
to convert your model. It is recommend to use TensorRT on jetson device, please have a try.
@suay1113 Could you provide more detail about how do you convert your model and you environments?