yolov5
yolov5 copied to clipboard
Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime)
Maybe it is the easiest way for registering EfficientNMS plugin in onnx and building tensorrt engine. I am inspired by this issue : https://github.com/ultralytics/yolov5/issues/6430
@triple-Mu thanks for the PR, this looks great! Especially like the usage example notebook.
If this works for TRT can it also work for ONNX exports?
@triple-Mu thanks for the PR, this looks great! Especially like the usage example notebook.
If this works for TRT can it also work for ONNX exports?
This pr exports onnx by default method, and adds an additional graph structure to make the network output meet the input of the TRT nms plugin, and finally adds the nms plugin to allow the network to be detected end-to-end. I don't know what you mean by onnx exports? Only the original onnx cannot achieve end-to-end.
@triple-Mu yes I mean right here, the ONNX-only export (no TRT), i.e.:
python export.py --include onnx --nms

EDIT: Since it seems like the NMS modification is done directly on the ONNX model, perhaps the PR updates are suitable as well for the export_onnx() call on the line shown above.
@triple-Mu yes I mean right here, the ONNX-only export (no TRT), i.e.:
python export.py --include onnx --nms
![]()
EDIT: Since it seems like the NMS modification is done directly on the ONNX model, perhaps the PR updates are suitable as well for the export_onnx() call on the line shown above.
Got it.Means that using --nms flag(and score/iou threshold) may export onnx which only used for TRT, remove TRT building in this pr. If so this onnx will be not available for onnxruntime and openvino,and so on
@glenn-jocher This new pr modifies the onnx export method and adds the judgment of the nms flag. After the exported onnx has been tested, the engine can be directly exported by trtexec. All test code can be seen in notebook.
@triple-Mu I'd like to handle your two PRs today. But I'm confused as the original PR https://github.com/ultralytics/yolov5/pull/6984 was limited in scope to adding trtexec support but now seems expanded. Can you please summarize the changes in each and if they overlap anywhere? Also what's your recommendation, should we merge 1 or the other or both, and if both in which order?
- https://github.com/ultralytics/yolov5/pull/6984
- https://github.com/ultralytics/yolov5/pull/7736
@triple-Mu I'd like to handle your two PRs today. But I'm confused as the original PR #6984 was limited in scope to adding trtexec support but now seems expanded. Can you please summarize the changes in each and if they overlap anywhere? Also what's your recommendation, should we merge 1 or the other or both, and if both in which order?
@glenn-jocher
Thank you for your reply! Pr #6984 is just a simple attempt, using trtexec can directly convert the onnx exported by #7736 into an engine, which is shown in my notebook. Since the onnx exported by #7736 cannot be used together with detect.py, I suggest closing #6984 and adding the documentation for exporting using trtexec for #7736.
@triple-Mu ok got it! Let's close #6984 then and please add the python export.py --include engine --trtexec
flag capability to #7736 for trtexec engine exports. Can you do that?
@triple-Mu ok got it! Let's close https://github.com/ultralytics/yolov5/pull/6984 then and please add the python export.py --include engine --trtexec flag capability to https://github.com/ultralytics/yolov5/pull/7736 for trtexec engine exports. Can you do that?
It is my pleasure to be able to help you, I have the following questions:
-
If use
python export.py --include engine --trtexec
, does it mean that #7736 the function ofexport_onnx
needs to be deleted, which is back to the original version of this pr, and the modified onnx is placed inexport_engine
. -
If the current
export_onnx
function is still retained, does it mean that I need to callexport_onnx
and add the "export_engine_with_trtexec" function while executing this command.
@triple-Mu I think the two topics are separate:
-
--trtexec
: I think the original trtexec PR was limited in scope to simply adding a--trtexec
flag to export.py which ran export via trtexec command instead of tensorrt pip package install (nothing changed about the exported TensorRT models).python export.py --include engine --trtexec
export appeared to work maybe 2x faster than default (i.e. mabe 2 minutes instead of 4 minutes to export), which could be helpful to users exporting many models. -
NMS pipelining. This has been a topic of a variety of formats, i.e. CoreML, ONNX and TensorRT where users are looking to deploy without the PyTorch dependency. This PR appears to implement this well for TensorRT so no additional changes should be needed here.
@triple-Mu I think the two topics are separate:
--trtexec
: I think the original trtexec PR was limited in scope to simply adding a--trtexec
flag to export.py which ran export via trtexec command instead of tensorrt pip package install (nothing changed about the exported TensorRT models).python export.py --include engine --trtexec
export appeared to work maybe 2x faster than default (i.e. mabe 2 minutes instead of 4 minutes to export), which could be helpful to users exporting many models.- NMS pipelining. This has been a topic of a variety of formats, i.e. CoreML, ONNX and TensorRT where users are looking to deploy without the PyTorch dependency. This PR appears to implement this well for TensorRT so no additional changes should be needed here.
@glenn-jocher All right!
However, after registering NMS, onnx cannot be exported normally using python-tensorrt
, because the instruction trt.init_libnvinfer_plugins(trt_logger, namespace="")
to introduce plugin namespace needs to be added.
In addition, when the pytorch model is loaded in the main process, it may be affected by problems such as cuda stream
.
Exporting by trtexec may require opening a new process.
The above is what I am testing to work on.
In addition, I would like to ask if you have a social account to connect with?
@glenn-jocher
I'm not sure why I can't export with the following command --python export.py --weights yolov5s.pt --include engine --trtexec
. after adding the above.
If I run this command alone subprocess.check_output(cmd,shell=True)
, it executes correctly under the new python file.
So I suspect that it has something to do with pytorch model loading. Is there a conflict between main processes?
Log is as shown:
(torch) ubuntu@y9000p:~/work/yolov5$ python export.py --weights yolov5s.pt --include engine --trtexec
export: data=data/coco128.yaml, weights=['yolov5s.pt'], imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, train=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, trtexec=True, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['engine']
YOLOv5 🚀 v6.1-224-gba552fe Python-3.8.13 torch-1.11.0+cu115 CPU
Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
PyTorch: starting from yolov5s.pt with output shape (1, 25200, 85) (14.1 MB)
[05/19/2022-22:43:30] [W] --workspace flag has been deprecated by --memPoolSize flag.
Cuda failure: no CUDA-capable device is detected
Aborted (core dumped)
Traceback (most recent call last):
File "export.py", line 646, in <module>
main(opt)
File "export.py", line 641, in main
run(**vars(opt))
File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "export.py", line 561, in run
f[1] = export_engine(model, im, file, train, half, simplify, workspace, verbose, trtexec)
File "export.py", line 258, in export_engine
subprocess.check_output(cmd, shell=True)
File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/usr/src/tensorrt/bin/trtexec --onnx=yolov5s.onnx --saveEngine=yolov5s.engine --workspace=4096' returned non-zero exit status 134.
@glenn-jocher Good NEWS! I recently tried adding NMS to other inference backends like onnxruntime and openvino, and the results were astounding! Just modifying a part of the onnx graph can achieve a very good effect. It is worth mentioning that although the --export-type ort flag can be turned on to export the graph with nms, some post-processing operations are still required. I did not completely place the post-processing in the graph, which may cause the network to output too many tensors. You can get all result in notebooks~
Recently I re-changed this branch again. Could you please review this pr?
Would greatly appreciate this feature being rolled into the production version. Exporting Object Detection models to something like ONNX with NMS will allow many people to use light weight frameworks on Edge devices or things like AWS Lambda. Torch is a lot of overhead for just implementing NMS.
Edit: I've tested the trtNMS branch to export the model using these arguments:
python export.py --weights mymodel.pt --include onnx --nms --conf-thres 0.4
When I inference using onnxruntime, I am getting different results than I am with detect.py. It seems like the conf_thres on the ONNX model has some lower bound of ~0.7. There are no predictions below that. The actual confidence values for each detection do not quite match either.
Edit2: It appears it is being limited to 100 response values. I tried modifying the "max_output_boxes" to be 1000 but it still only returns 100 detections per image.
Edit3: I needed to modify the --top-k-per-class and --top-k-all to be 100. This yielded more than 100 results. Detections and confidence with onnxruntime don't exactly match but we're in the ballpark.
Hi, @triple-Mu! Thanks for your amazing work on adding NMS!
@wolfpack12 has mentioned that outputs of the models exported with this PR do not exactly match original outputs of .pt
model. Have you confirmed that models exported with this PR work properly and have same or close outputs to the original? I'm especially interested in the TensorRT version.
Thank you!
Would greatly appreciate this feature being rolled into the production version. Exporting Object Detection models to something like ONNX with NMS will allow many people to use light weight frameworks on Edge devices or things like AWS Lambda. Torch is a lot of overhead for just implementing NMS.
Edit: I've tested the trtNMS branch to export the model using these arguments:
python export.py --weights mymodel.pt --include onnx --nms --conf-thres 0.4
When I inference using onnxruntime, I am getting different results than I am with detect.py. It seems like the conf_thres on the ONNX model has some lower bound of ~0.7. There are no predictions below that. The actual confidence values for each detection do not quite match either.
Edit2: It appears it is being limited to 100 response values. I tried modifying the "max_output_boxes" to be 1000 but it still only returns 100 detections per image.
Edit3: I needed to modify the --top-k-per-class and --top-k-all to be 100. This yielded more than 100 results. Detections and confidence with onnxruntime don't exactly match but we're in the ballpark.
I re-updated the code of this pr, please try again Usage: For tensorrt nms export:
python3 export.py --weights yolov5s.pt --include onnx --nms trt --iou 0.65 --conf 0.001 --topk-all 300 --simplify
For onnxruntime nms export:
python3 export.py --weights yolov5s.pt --include onnx --nms ort --iou 0.65 --conf 0.001 --topk-all 300 --simplify
For openvino nms export:
python3 export.py --weights yolov5s.pt --include openvino --nms ovo --iou 0.65 --conf 0.001 --topk-all 300 --simplify
In order to export the model supported by the corresponding backend, you need to specify --nms trt/ort/ovo to export onnx or xml. Of course, onnx is a product that must be generated.
In addition, you can export models in dynamic shape. You can add --dynamic batch
or --dynamic all
to export dynamic batch or dynamic axes onnx first.
An example onnx for TensorRT export cmd is
python3 export.py --weights yolov5s.pt --include onnx --nms trt --iou 0.65 --conf 0.001 --topk-all 300 --simplify --dynamic batch
If you want to export orin yolov5 onnx model with dynamic shape, the cmd is:
python3 export.py --weights yolov5s.pt --include onnx -simplify --dynamic
You don't need to pass arguments to --dynamic
If you want to export orin yolov5 tflite model with nms, the cmd is:
python3 export.py --weights yolov5s.pt --include tflite --nms
You don't need to pass arguments to --nms
.
I’ll test in the new year. Just curious, how is this implementation different than yolort?
The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:
1. export.py fails on models where the --nms argument is used on export (see error message below)
nc = prediction.shape[2] - nm - 5 # number of classes
IndexError: tuple index out of range
2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:
ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])
ort_inputs = {ort_session.get_inputs()[0].name: image}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs
The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:
1. export.py fails on models where the --nms argument is used on export (see error message below)
nc = prediction.shape[2] - nm - 5 # number of classes IndexError: tuple index out of range
2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:
ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider']) ort_inputs = {ort_session.get_inputs()[0].name: image} ort_outs = ort_session.run(None, ort_inputs) img_out_y = ort_outs
The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:
1. export.py fails on models where the --nms argument is used on export (see error message below)
nc = prediction.shape[2] - nm - 5 # number of classes IndexError: tuple index out of range
2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:
ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider']) ort_inputs = {ort_session.get_inputs()[0].name: image} ort_outs = ort_session.run(None, ort_inputs) img_out_y = ort_outs
Question 1: It should be caused by your use of the non_max_suppression
function. This shouldn't happen when export.py is executed, can you provide a run command?
Question 2. In order to avoid detecting that there is no object in the picture, such as a randomly generated noise. I added a class of -1, boxes and a result of score 0 for this case in postprocessing. This prevents the network output from being empty. You can use the numeric value of the first output to do a secondary filter on the box and score. It's easy, please refer to my submitted notebook.
The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things: 1. export.py fails on models where the --nms argument is used on export (see error message below)
nc = prediction.shape[2] - nm - 5 # number of classes IndexError: tuple index out of range
2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:
ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider']) ort_inputs = {ort_session.get_inputs()[0].name: image} ort_outs = ort_session.run(None, ort_inputs) img_out_y = ort_outs
The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things: 1. export.py fails on models where the --nms argument is used on export (see error message below)
nc = prediction.shape[2] - nm - 5 # number of classes IndexError: tuple index out of range
2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:
ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider']) ort_inputs = {ort_session.get_inputs()[0].name: image} ort_outs = ort_session.run(None, ort_inputs) img_out_y = ort_outs
Question 1: It should be caused by your use of the
non_max_suppression
function. This shouldn't happen when export.py is executed, can you provide a run command?Question 2. In order to avoid detecting that there is no object in the picture, such as a randomly generated noise. I added a class of -1, boxes and a result of score 0 for this case in postprocessing. This prevents the network output from being empty. You can use the numeric value of the first output to do a secondary filter on the box and score. It's easy, please refer to my submitted notebook.
Sorry I had a typo. The error in Question 1 is when detect.py is used. It attempts to run the non_max_suppression function on the custom ONNX model where NMS is part of the graph.
Here's the run command:
python detect.py --weights weights/model1.onnx --source image1.tif --conf-thres 0.4 --imgsz 512 640 --save-txt --iou-thres 0.45
Here's more granular output of the error:
Loading weights/model1.onnx for ONNX Runtime inference...
Traceback (most recent call last):
File "/home/user/onnxexportyolov5/yolov5/detect.py", line 261, in <module>
main(opt)
File "/home/user/onnxexportyolov5/yolov5/detect.py", line 256, in main
run(**vars(opt))
File "/home/user/.local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/user/onnxexportyolov5/yolov5/detect.py", line 132, in run
pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
File "/home/user/onnxexportyolov5/yolov5/utils/general.py", line 912, in non_max_suppression
nc = prediction.shape[2] - nm - 5 # number of classes
IndexError: tuple index out of range
For Question 2, the notebook is a great addition. Stepping through the process of exporting the model and then inferencing using onnxruntime will be very helpful to others. I suspect the issue I'm having is the conversion of the image to a tensor. I'm trying to execute this within an AWS Lambda Function (this was not trivial to do). The way I was converting the image is different than your method:
imageStream = io.BytesIO(binary_content[0])
imageFile = Image.open(imageStream).convert('RGB').resize((512, 640))
imageFile_Array = np.asarray(imageFile).astype('float32') / 255.0
imageFile_Array = imageFile_Array[None]
imageFile_Array = np.transpose(imageFile_Array, [0, 3, 1, 2])
The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:
1. export.py fails on models where the --nms argument is used on export (see error message below)
nc = prediction.shape[2] - nm - 5 # number of classes
IndexError: tuple index out of range
2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:
ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])
ort_inputs = {ort_session.get_inputs()[0].name: image}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs
The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:
1. export.py fails on models where the --nms argument is used on export (see error message below)
nc = prediction.shape[2] - nm - 5 # number of classes
IndexError: tuple index out of range
2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:
ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])
ort_inputs = {ort_session.get_inputs()[0].name: image}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs
Question 1: It should be caused by your use of the
non_max_suppression
function. This shouldn't happen when export.py is executed, can you provide a run command?Question 2. In order to avoid detecting that there is no object in the picture, such as a randomly generated noise. I added a class of -1, boxes and a result of score 0 for this case in postprocessing. This prevents the network output from being empty. You can use the numeric value of the first output to do a secondary filter on the box and score. It's easy, please refer to my submitted notebook.
Sorry I had a typo. The error in Question 1 is when detect.py is used. It attempts to run the non_max_suppression function on the custom ONNX model where NMS is part of the graph.
Here's the run command:
python detect.py --weights weights/model1.onnx --source image1.tif --conf-thres 0.4 --imgsz 512 640 --save-txt --iou-thres 0.45
Here's more granular output of the error:
Loading weights/model1.onnx for ONNX Runtime inference... Traceback (most recent call last): File "/home/user/onnxexportyolov5/yolov5/detect.py", line 261, in <module> main(opt) File "/home/user/onnxexportyolov5/yolov5/detect.py", line 256, in main run(**vars(opt)) File "/home/user/.local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/user/onnxexportyolov5/yolov5/detect.py", line 132, in run pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det) File "/home/user/onnxexportyolov5/yolov5/utils/general.py", line 912, in non_max_suppression nc = prediction.shape[2] - nm - 5 # number of classes IndexError: tuple index out of range
For Question 2, the notebook is a great addition. Stepping through the process of exporting the model and then inferencing using onnxruntime will be very helpful to others. I suspect the issue I'm having is the conversion of the image to a tensor. I'm trying to execute this within an AWS Lambda Function (this was not trivial to do). The way I was converting the image is different than your method:
imageStream = io.BytesIO(binary_content[0]) imageFile = Image.open(imageStream).convert('RGB').resize((512, 640)) imageFile_Array = np.asarray(imageFile).astype('float32') / 255.0 imageFile_Array = imageFile_Array[None] imageFile_Array = np.transpose(imageFile_Array, [0, 3, 1, 2])
It seems that you feed an input tensor with shape 512x640. Because of we export onnx with shape 640x640, if you feed a wrong shape tensor, it won't work.
@triple-Mu Unfortunately that isn't the issue. I can send it a 640x640 image and the results still don't match. I suspect the issue is the use of letterbox (Still need to confirm). In your example notebook, you import letterbox from YOLOv5 which requires cv2 to be imported. If I want to run this in AWS Lambda, I don't want to import cv2 or torch since it would exceed the 250MB limit. So I'd need to implement using numpy or base python. Will provide results when I dig more into this.
@triple-Mu Unfortunately that isn't the issue. I can send it a 640x640 image and the results still don't match. I suspect the issue is the use of letterbox (Still need to confirm). In your example notebook, you import letterbox from YOLOv5 which requires cv2 to be imported. If I want to run this in AWS Lambda, I don't want to import cv2 or torch since it would exceed the 250MB limit. So I'd need to implement using numpy or base python. Will provide results when I dig more into this.
Maybe you can save the input tensor to your local pc as npy file. Besides, I suggest you using np.ascontiguousarray when transpose a ndarray. I have no idea about the code you provide. It seems so simple.
I added the letterboxing function below. It helps increase the accuracy but its still slightly off.
def letterbox_image(image, size):
iw, ih = image.size
w, h = size
scale = min(w/iw, h/ih)
nw = int(iw*scale)
nh = int(ih*scale)
image = image.resize((nw,nh), Image.BICUBIC)
new_image = Image.new('RGB', size, (114,114,114))
new_image.paste(image, ((w-nw)//2, (h-nh)//2))
return new_image
I call it in my Lambda function using this:
imageFile = letterbox_image(Image.open(imageStream), (640, 640) )
EDIT: I'm increasingly confident this is a resizing/letterbox issue. I've played around with changing the padding color from (114, 114, 114) to (0, 0, 0) and (255, 255, 255). This actually affects the number of calls the model makes!
In addition, the scaling method matters. In the code above, the Image.BICUBIC method is used for interpolation on the scaling. In YOLOv5, the letterbox function uses the cv2 code below:
im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
When I change the interpolation to Image.BILINEAR or Image.NEAREST, it makes a significant impact on the number of calls. I think this is the root cause of the problem and I am doubtful I will ever match the output.
The take away is that these models are extremely sensitive to very small changes. Scaling method, background color and input size have an unpredictable impact on the model performance.
EDIT2: For anyone that is morbidly curious, the difference in interpolation between PIL and CV2 is discuss ad-nauseum here: https://github.com/python-pillow/Pillow/issues/2718
I found that Image.BICUBIC had the closest results to the cv2.resize method used in YOLOv5. I tried Image.BILINEAR since, you know, it should be equivalent to cv2.INTER_LINEAR. But it wasn't!
This commentary goes beyond the scope of this issue (exporting NMS for onnxruntime). I believe the branch that @triple-Mu created accomplishes this. The only thing I see that needs to be wrapped up is ensuring NMS-enabled ONNX models can use the detect.py function in YOLOv5 without throwing an error.
👋 Hello there! We wanted to let you know that we've decided to close this pull request due to inactivity. We appreciate the effort you put into contributing to our project, but unfortunately, not all contributions are suitable or aligned with our product roadmap.
We hope you understand our decision, and please don't let it discourage you from contributing to open source projects in the future. We value all of our community members and their contributions, and we encourage you to keep exploring new projects and ways to get involved.
For additional resources and information, please see the links below:
- Docs: https://docs.ultralytics.com
- HUB: https://hub.ultralytics.com
- Community: https://community.ultralytics.com
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐