q yao comments

Results 318 comments of


                                            q yao

[Bug] TRTBatchedRotatedNMS PlugIn CUDA kernels raise a runtime CUDA error - cudaErrorLaunchOutOfResources on NVIDIA Jetson TX2

jetson device has limited register usage. Try smaller `pre_top_k` `keep_top_k` `max_output_boxes_per_class` in https://github.com/open-mmlab/mmdeploy/blob/main/configs/mmrotate/rotated-detection_static.py

[Bug] TRTBatchedRotatedNMS PlugIn CUDA kernels raise a runtime CUDA error - cudaErrorLaunchOutOfResources on NVIDIA Jetson TX2

What is the value of `t_size` in https://github.com/open-mmlab/mmdeploy/blob/2882c64eea8640f913588f6962e66abf2e7b6c86/csrc/mmdeploy/backend_ops/tensorrt/common_impl/nms/allClassRotatedNMS.cu#L431 ?

[Bug] TRTBatchedRotatedNMS PlugIn CUDA kernels raise a runtime CUDA error - cudaErrorLaunchOutOfResources on NVIDIA Jetson TX2

I see. you can try use a smaller BS, which might reduce the register usage of the kernel. Note that smaller BS means larger t_size, which will also enlarge register...

[Bug] TRTBatchedRotatedNMS PlugIn CUDA kernels raise a runtime CUDA error - cudaErrorLaunchOutOfResources on NVIDIA Jetson TX2

> I think that the implementation of the kernel itself cause this CUDA error. `cudaErrorLaunchOutOfResources` will be raised when the kernel uses too many registers. Both `BS` and `t_size` are...

[Bug] - mmdeploy - ERROR - scope_name should be a string, but got None

rotated faster rcnn has been supported in dev-1.x branch

[BUG] TensorRT optimised model is detecting less objects compared to pytorch model, most likely some difference in post processing.

`t_size` is the cache size of each cuda thread in NMS kernel. https://github.com/NVIDIA/TensorRT/blob/96e23978cd6e4a8fe869696d3d8ec2b47120629b/plugin/common/kernels/allClassNMS.cu#L196 Large cache size will lead to low occupancy(large amount of registers are required). https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm If you insist...

[BUG] TensorRT optimised model is detecting less objects compared to pytorch model, most likely some difference in post processing.

`X` is the `t_size` you want. ``` // BS is 512 const int t_size = (top_k + BS - 1) / BS; ``` So 30000 requires `t_size = 60` I...

q yao

[Bug] TRTBatchedRotatedNMS PlugIn CUDA kernels raise a runtime CUDA error - cudaErrorLaunchOutOfResources on NVIDIA Jetson TX2

[Bug] TRTBatchedRotatedNMS PlugIn CUDA kernels raise a runtime CUDA error - cudaErrorLaunchOutOfResources on NVIDIA Jetson TX2

[Bug] TRTBatchedRotatedNMS PlugIn CUDA kernels raise a runtime CUDA error - cudaErrorLaunchOutOfResources on NVIDIA Jetson TX2

[Bug] TRTBatchedRotatedNMS PlugIn CUDA kernels raise a runtime CUDA error - cudaErrorLaunchOutOfResources on NVIDIA Jetson TX2

[Bug] - mmdeploy - ERROR - scope_name should be a string, but got None

[BUG] TensorRT optimised model is detecting less objects compared to pytorch model, most likely some difference in post processing.

[BUG] TensorRT optimised model is detecting less objects compared to pytorch model, most likely some difference in post processing.

MASKRCNN trt_model has Zero output

compatibility with mmdetection-to-tensorrt and deepstream

compatibility with mmdetection-to-tensorrt and deepstream