yolov5
yolov5 copied to clipboard
New way for register nms in onnx for tensorrt onnxruntime openvino
The last pr provides a basic solution for exporting onnx and modifying it. This pr improves the last pr so that registered nms is completely dependent on pytorch. Simple yet effective!
@triple-Mu thanks for the PR! The easiest argument structure is to simply use an --nms
arg that would be handled accordingly for formats that are nms capable.
This looks good @triple-Mu and the process looks correct (agree with @glenn-jocher comment above).
We have found that the EfficientNMS plugin does not always work very predictably as FP16 support was only merged in recently and so does not work correctly in things like the NVIDIA DeepStream Docker images (on older TensorRT versions). It would be better to use the TensorRT BatchedNMS plugin which has been around longer and is more stable. Given you are already converting cx,cy,w,h to tlbr format it should be easy to update 👍
FP16 support was only merged in recently
Hi @visualcortex-team , TensorRT supports EfficientNMS fp16 mode from 8.2.4+.
It would be better to use the TensorRT BatchedNMS plugin which has been around longer and is more stable.
But EfficientNMS plugin is much faster than BatchedNMS. BTW, TensorRT release 8.4 GA today.
Thanks @zhiqwang . I guess people will either:
- need to know that the minimum supported version of TensorRT to use EfficientNMS is 8.2.4+ (as it will not throw errors it will just produce no results) - this could be added as a warning when exporting?
or
- A flag is provided to choose which NMS to export with (BatchedNMS vs EfficientNMS)
Hi @visualcortex-team
need to know that the minimum supported version of TensorRT to use EfficientNMS is 8.2.4+ (as it will not throw errors it will just produce no results) - this could be added as a warning when exporting?
Agreed!
A flag is provided to choose which NMS to export with (BatchedNMS vs EfficientNMS)
Of course there is no problem supporting BatchedNMS plugin from a technical point of view. Seems that TensorRT made this BatchedNMS plugin according to the TensorFlow's interface, it is very slow, there is no need to support it for me.
For future readers:
-
the TensorRT release @zhiqwang spoke about updated the
apt
packages to version:tensorrt-dev/unknown 8.4.1.5-1+cuda11.6 amd64
which will work correctly with theEfficientNMS
plugin. -
to decode the bounding boxes if used with Nvidia Deepstream you will need a custom decoder implementation in
nvdsinfer_custombboxparser.cpp
using these mappings (unfortunately none of the others work):
object.left=p_bboxes[4*i];
object.top=p_bboxes[4*i+1];
object.width=(p_bboxes[4*i+2] - object.left);
object.height= (p_bboxes[4*i+3] - object.top);
Why I can not pass CI test? @glenn-jocher
@glenn-jocher
I am very happy that yolov5 will support dynamic batch at https://github.com/ultralytics/yolov5/pull/8526.
At the same time, I also applied dynamic batch to the registered NMS!
--nms
default set tensorrt nms into onnx or tf.js nms into model
--nms 0 or any int
set onnx nms into onnx nms values is the same as max-wh in nms
--dynamic
default dynamic all axes into onnx or tf.js model not support onnx for tensorrt
--dynamic 0
dynamic batch into onnx also support onnx for tensorrt
@triple-Mu thanks for the input! We'll make sure to consider the dynamic batching and NMS options in our future development. Your feedback is greatly appreciated!