q yao
q yao
jetson device has limited register usage. Try smaller `pre_top_k` `keep_top_k` `max_output_boxes_per_class` in https://github.com/open-mmlab/mmdeploy/blob/main/configs/mmrotate/rotated-detection_static.py
What is the value of `t_size` in https://github.com/open-mmlab/mmdeploy/blob/2882c64eea8640f913588f6962e66abf2e7b6c86/csrc/mmdeploy/backend_ops/tensorrt/common_impl/nms/allClassRotatedNMS.cu#L431 ?
I see. you can try use a smaller BS, which might reduce the register usage of the kernel. Note that smaller BS means larger t_size, which will also enlarge register...
> I think that the implementation of the kernel itself cause this CUDA error. `cudaErrorLaunchOutOfResources` will be raised when the kernel uses too many registers. Both `BS` and `t_size` are...
rotated faster rcnn has been supported in dev-1.x branch
`t_size` is the cache size of each cuda thread in NMS kernel. https://github.com/NVIDIA/TensorRT/blob/96e23978cd6e4a8fe869696d3d8ec2b47120629b/plugin/common/kernels/allClassNMS.cu#L196 Large cache size will lead to low occupancy(large amount of registers are required). https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm If you insist...
`X` is the `t_size` you want. ``` // BS is 512 const int t_size = (top_k + BS - 1) / BS; ``` So 30000 requires `t_size = 60` I...
@redzhepdx Hi, I am the maintainer of this repo and `mmdet2trt`. Honestly, I am surprised that these repos are still being used by someone. Since `MMDetection` has been updated to...
The log indicates that the buffer size is smaller than a scalar. There must be something wrong when saving the engine. Could you provide more detail?
It seems that this error is caused by ops serialization. What model are you using?