tensorrt_demos Drop in mAP after TensorRT optimization

Drop in mAP after TensorRT optimization

Open philipp-schmidt opened this issue 3 years ago • 29 comments

@jkjung-avt Hi, could we work together on the problem of the reduced accuracy? I believe I have similar issues in my implementation and I do not use any onnx conversion whatsoever. I would like to get this fixed and could use additional examples where it goes wrong to determine what's the cause.

We could start to work on the postprocessing method. I started with existing code for the yolo layer plugin similar to yours and had to fix a few errors already. Please let me know if my code increases your precision:

https://github.com/isarsoft/yolov4-triton-tensorrt/blob/master/clients/python/processing.py

Jan 02 '21 16:01 philipp-schmidt

Here are all fixes I made so far: https://github.com/isarsoft/yolov4-triton-tensorrt/commits/master/clients/python/processing.py

Jan 02 '21 17:01 philipp-schmidt

Hi, could we work together on the problem of the reduced accuracy?

That sounds good.

Here are all fixes I made so far: https://github.com/isarsoft/yolov4-triton-tensorrt/commits/master/clients/python/processing.py

I have read through your commit history. I think my current code does not have those issues you've fixed in your own code...

I did reference the original AlexyAB/darknet code to develop my implementation. For example, "scale_x_y", which is used in yolov4/yolov4-tiny models, would affect how center x/y coordinates of bboxes are calculated. And I implemented that calculation in the "yolo_layer" plugin.

https://github.com/jkjung-avt/tensorrt_demos/blob/793d7ae9c74b51007296bb63cdd1bc6bd0e04e8e/plugins/yolo_layer.cu#L238-L239

Jan 03 '21 01:01 jkjung-avt

Related issues:

https://github.com/jkjung-avt/tensorrt_demos/issues/237
https://github.com/jkjung-avt/tensorrt_demos/issues/255

Jan 04 '21 04:01 jkjung-avt

I will prob. have time this weekend to crosscheck implementations. I will get back at you when I have more info.

Jan 06 '21 15:01 philipp-schmidt

@philipp-schmidt Look forward to your updates. Meanwhile, I'm inclined to think the problem lies more likely in darknet -> onnx -> TensorRT conversion. I will also review the code when I have time.

Jan 06 '21 16:01 jkjung-avt

Hi, a main source of wrong results and bad accuracy has been fixed for me in triton inference server. It was a server side race condition... I was hunting ghosts for many weeks... https://github.com/triton-inference-server/server/issues/2339

Now I can focus on mAP, I'll keep you posted.

Jan 26 '21 03:01 philipp-schmidt

NVIDIA has this Polygraphy tool which could be used to compare "layer-wise" outputs between the ONNX model and the TensorRT engine. I think that would be an effective way to debug this mAP dropping problem.

Here is an example Polygraphy debugging output: https://github.com/NVIDIA/TensorRT/issues/1087#issuecomment-786355785

I'm not sure when I'll have time to look into this, though.

Mar 24 '21 02:03 jkjung-avt

I couldn't yet make the time to fully tackle this as well unfortunately. This Polygraph tool seems to be very helpful regardless, thanks for the pointer.

Mar 24 '21 10:03 philipp-schmidt

NVIDIA's Polygraphy tool turns out to be very easy to use. I just follow the installation instructions and use the following command to debug the models.

$ polygraphy run yolov3-tiny-416.onnx --trt --fp16 --onnxrt
......
[I] Accuracy Comparison | trt-runner-N0-04/10/21-21:37:09 vs. onnxrt-runner-N0-04/10/21-21:37:09
[I]     Comparing Output: '016_convolutional' (dtype=float32, shape=(1, 255, 13, 13)) with '016_convolutional' (dtype=float32, shape=(1, 255, 13, 13))
[I]         Required tolerances: [atol=0.089517] OR [rtol=1e-05, atol=0.089425] OR [rtol=5.9166, atol=1e-05] | Mean Error: Absolute=0.010562, Relative=0.0033428
            Runner: trt-runner-N0-04/10/21-21:37:09          | Stats: mean=-6.5803, min=-15.992 at (0, 174, 0, 0), max=2.1582 at (0, 90, 12, 2)
            Runner: onnxrt-runner-N0-04/10/21-21:37:09       | Stats: mean=-6.5821, min=-16.004 at (0, 174, 0, 12), max=2.1647 at (0, 90, 12, 2)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: '023_convolutional' (dtype=float32, shape=(1, 255, 26, 26)) with '023_convolutional' (dtype=float32, shape=(1, 255, 26, 26))
[I]         Required tolerances: [atol=0.095589] OR [rtol=1e-05, atol=0.095568] OR [rtol=268.68, atol=1e-05] | Mean Error: Absolute=0.012998, Relative=0.0078038
            Runner: trt-runner-N0-04/10/21-21:37:09          | Stats: mean=-7.1557, min=-18.188 at (0, 174, 15, 25), max=3.3008 at (0, 249, 15, 21)
            Runner: onnxrt-runner-N0-04/10/21-21:37:09       | Stats: mean=-7.1579, min=-18.159 at (0, 174, 15, 25), max=3.3272 at (0, 249, 15, 21)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[E]     FAILED | Mismatched outputs: ['016_convolutional', '023_convolutional']

I summarize the results below. All comparisons are done between TensorRT FP16 and ONNX Runtime.

yolov3-tiny-416
- '016_convolutional' Mean Error: Absolute=0.010562, Relative=0.0033428
- '023_convolutional' Mean Error: Absolute=0.012998, Relative=0.0078038
yolov3-608
- '082_convolutional' Mean Error: Absolute=0.018218, Relative=0.0046612
- '094_convolutional' Mean Error: Absolute=0.018218, Relative=0.0046612
- '106_convolutional' Mean Error: Absolute=0.020347, Relative=0.0078671
yolov4-tiny-416
- '030_convolutional' Mean Error: Absolute=0.01394, Relative=0.0032779
- '037_convolutional' Mean Error: Absolute=0.013386, Relative=0.0069264
yolov4-608
- '139_convolutional' Mean Error: Absolute=0.0051023, Relative=0.0026887
- '150_convolutional' Mean Error: Absolute=0.0070509, Relative=0.0040541
- '161_convolutional' Mean Error: Absolute=0.0074914, Relative=0.001748

Apr 10 '21 14:04 jkjung-avt

NVIDIA's Polygraphy tool turns out to be very easy to use. I just follow the installation instructions and use the following command to debug the models.

$ polygraphy run yolov3-tiny-416.onnx --trt --fp16 --onnxrt
......
[I] Accuracy Comparison | trt-runner-N0-04/10/21-21:37:09 vs. onnxrt-runner-N0-04/10/21-21:37:09
[I]     Comparing Output: '016_convolutional' (dtype=float32, shape=(1, 255, 13, 13)) with '016_convolutional' (dtype=float32, shape=(1, 255, 13, 13))
[I]         Required tolerances: [atol=0.089517] OR [rtol=1e-05, atol=0.089425] OR [rtol=5.9166, atol=1e-05] | Mean Error: Absolute=0.010562, Relative=0.0033428
            Runner: trt-runner-N0-04/10/21-21:37:09          | Stats: mean=-6.5803, min=-15.992 at (0, 174, 0, 0), max=2.1582 at (0, 90, 12, 2)
            Runner: onnxrt-runner-N0-04/10/21-21:37:09       | Stats: mean=-6.5821, min=-16.004 at (0, 174, 0, 12), max=2.1647 at (0, 90, 12, 2)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: '023_convolutional' (dtype=float32, shape=(1, 255, 26, 26)) with '023_convolutional' (dtype=float32, shape=(1, 255, 26, 26))
[I]         Required tolerances: [atol=0.095589] OR [rtol=1e-05, atol=0.095568] OR [rtol=268.68, atol=1e-05] | Mean Error: Absolute=0.012998, Relative=0.0078038
            Runner: trt-runner-N0-04/10/21-21:37:09          | Stats: mean=-7.1557, min=-18.188 at (0, 174, 15, 25), max=3.3008 at (0, 249, 15, 21)
            Runner: onnxrt-runner-N0-04/10/21-21:37:09       | Stats: mean=-7.1579, min=-18.159 at (0, 174, 15, 25), max=3.3272 at (0, 249, 15, 21)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[E]     FAILED | Mismatched outputs: ['016_convolutional', '023_convolutional']

I am guessing this is where there is a loss of accuracy? Will there be a fix?

Apr 10 '21 14:04 ROBYER1

Interesting results, thanks for checking it out jkjung. I'm curious if there are any guarantees from TensorRT regarding precision.

And taking into account that TensorRT selects from a range of different implementations for each layer the next question is: will this accuracy drop be reproducible and consistent among different hardwares?

Apr 10 '21 14:04 philipp-schmidt

I re-ran Polygraphy by specifying the correct input data range for the yolo models ("--float-min 0.0 --float-max 1.0"), e.g.

$ polygraphy run yolov3-tiny-416.onnx --trt --fp16 --onnxrt --float-min 0.0 --float-max 1.0
......
[I] Accuracy Comparison | trt-runner-N0-04/11/21-12:47:58 vs. onnxrt-runner-N0-04/11/21-12:47:58
[I]     Comparing Output: '016_convolutional' (dtype=float32, shape=(1, 255, 13, 13)) with '016_convolutional' (dtype=float32, shape=(1, 255, 13, 13))
[I]         Required tolerances: [atol=0.049671] OR [rtol=1e-05, atol=0.049614] OR [rtol=18.328, atol=1e-05] | Mean Error: Absolute=0.008115, Relative=0.0037584
            Runner: trt-runner-N0-04/11/21-12:47:58          | Stats: mean=-5.2187, min=-18.516 at (0, 174, 11, 3), max=1.5859 at (0, 111, 4, 4)
            Runner: onnxrt-runner-N0-04/11/21-12:47:58       | Stats: mean=-5.2171, min=-18.497 at (0, 174, 11, 11), max=1.5708 at (0, 0, 11, 0)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: '023_convolutional' (dtype=float32, shape=(1, 255, 26, 26)) with '023_convolutional' (dtype=float32, shape=(1, 255, 26, 26))
[I]         Required tolerances: [atol=0.069397] OR [rtol=1e-05, atol=0.069256] OR [rtol=9084.6, atol=1e-05] | Mean Error: Absolute=0.010339, Relative=0.058467
            Runner: trt-runner-N0-04/11/21-12:47:58          | Stats: mean=-5.6, min=-18.625 at (0, 174, 7, 25), max=2.4707 at (0, 19, 12, 23)
            Runner: onnxrt-runner-N0-04/11/21-12:47:58       | Stats: mean=-5.5999, min=-18.625 at (0, 174, 7, 25), max=2.4672 at (0, 19, 12, 23)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[E]     FAILED | Mismatched outputs: ['016_convolutional', '023_convolutional']
[E] FAILED | Command: /home/jkjung/project/MODNet/venv/bin/polygraphy run yolov3-tiny-416.onnx --trt --fp16 --onnxrt --float-min 0.0 --float-max 1.0

Here are the results: (FP16)

yolov3-tiny-416
- '016_convolutional' Mean Error: Absolute=0.008115, Relative=0.0037584
- '023_convolutional' Mean Error: Absolute=0.010339, Relative=0.058467
yolov3-608
- '082_convolutional' Mean Error: Absolute=0.01309, Relative=0.0043352
- '094_convolutional' Mean Error: Absolute=0.016002, Relative=0.0091567
- '106_convolutional' Mean Error: Absolute=0.016827, Relative=0.007058
yolov4-tiny-416
- '030_convolutional' Mean Error: Absolute=0.0065569, Relative=0.0021531
- '037_convolutional' Mean Error: Absolute=0.0080654, Relative=0.0048672
yolov4-608
- '139_convolutional' Mean Error: Absolute=0.01843, Relative=0.010256
- '150_convolutional' Mean Error: Absolute=0.014698, Relative=0.0067943
- '161_convolutional' Mean Error: Absolute=0.010814, Relative=0.0046399

The TensorRT "yolov3-tiny" FP16 engine is the only one which generates an output with >5% mean relative error from onnxruntime (all others are <1%). I think this indeed explains why the TensorRT "yolov3-tiny" engine evaluates to a much worse mAP than its DarkNet counterpart, comparing to the other models ("yolov3-608", "yolov4-tiny-416" and "yolov4-608")...

Apr 11 '21 05:04 jkjung-avt

Hello, sorry for not adding anything to the discussion but i wanted to check, i'm currently trying to implement this repository on a Jetson Nano.

Does the yolov4-tiny model also present the mAP drop that has been discussed mainly for yolov3?

Anyways if this is unclear i will conduct my own tests on a custom dataset and can report the results back to you.

Apr 13 '21 21:04 Duarte-Nunes

Does the yolov4-tiny model also present the mAP drop that has been discussed mainly for yolov3?

Based on my mAP evaluation results, "yolov3-tiny" suffers from this problem quite a bit. The other models ("yolov3", "yolov4-tiny" and "yolov4") are probably OK.

I would focus on solving the problem for "yolov3-tiny" if I have time.

Apr 14 '21 02:04 jkjung-avt

@jkjung-avt same problem for yolov4-mish ,yolov4-csp-swish model also, im getting lots of False positive & results are not same as darknet, May i know what are the reasons behind it? & how can we solve the FP problem?

Nov 15 '21 10:11 akashAD98

@akashAD98 This is a known issue. I've done my best to make sure the code is correct for both TensorRT engine building and inferencing. But TensorRT engine optimization does result in mAP drop for various YOLO models.

I have also tried to analyze this problem with polygraphy as shown above, but failed to find the root cause and a solution. I don't have a good answer now. That's why I kept this issue open...

Nov 15 '21 10:11 jkjung-avt

@jkjung-avt thanks for your kind reply. we all appreciate your great work. Hope you will get a solution in the future.

Nov 15 '21 11:11 akashAD98

@jkjung-avt can we do inference & check the FPS & False prediction of onnx model? what you think about accuracy (False prediction ) its the same as tensorrt? Do you have any script for doing inference on onnx model? same like tensorrt ?so we will get idea,whether problem with onnx conversion or onnx to tensorrt

Nov 19 '21 05:11 akashAD98

Do you have any script for doing inference on onnx model?

I have done that for MODNet, but not for YOLO models. Some of the code could be reused though: https://github.com/jkjung-avt/tensorrt_demos/blob/master/modnet/test_onnx.py

In order to check mAP and false detection with the ONNX YOLO models, you'll also have to implement "yolo" layers in the post-processing code (this part is handled by the "yolo_layer" plugin in TensorRT cases). I don't think I have time to do that in the near future...

Nov 19 '21 07:11 jkjung-avt

hi @jkjung-avt do you have any idea ? how should i solve this issue https://github.com/onnx/tutorials/issues/253#issuecomment-974605242

this is script inference_onnx_yolov4-mish.ipynb.txt

Nov 20 '21 06:11 akashAD98

@jkjung-avt please have look

Nov 24 '21 11:11 akashAD98

@akashAD98 I already commented: https://github.com/onnx/tutorials/issues/253#issuecomment-975356463

You need to modify the postprocessing code by yourself.

Nov 24 '21 12:11 jkjung-avt

Preprocessed image orignal shape: (1, 416, 416, 3) is so

i converted channel fist to channel last

& got

so its saying we need (1,3,416,416)

Nov 26 '21 07:11 akashAD98

@jkjung-avt is there any model which has almost similar results like darknet? yolov4-csp,yolo-mish has issues of false prediction ? so im looking for a good model of tensorrt. yolov4 is best??

Nov 29 '21 08:11 akashAD98

Please refer to the "mAP and FPS" table in Demo #5: YOLOv4.

Nov 29 '21 08:11 jkjung-avt

@jkjung-avt I converted yolov models into tensorrt & im getting to many false predictions as I said already in this issue,

one of the observations from my experiments- I trained home model having 50 classes - which has very low False predictions I trained music category model having only 10 classes- im getting too many False predictions.

i have done the same experiments with few more category classes & after that experiments, i come to know that its giving less FP if you have more classes & high FP if classes are less.

this is just my experimental observation-if you think this can help us to solve this issue, please let us know. Thanks

Mar 03 '22 07:03 akashAD98

@akashAD98 Thanks for sharing the info. I tried to think about possible causes of such results but could not come up with any. I will keep this in mind and share my experience/thoughts when I have new findings.

Mar 05 '22 01:03 jkjung-avt

Hello @jkjung-avt,

For my use case, I am trying to detect with yolov3 only one type of object (only one class).

After comparison with the code of yolov3 (https://github.com/experiencor/keras-yolo3), I observe that there is a major difference in the output classes probabilities processing.

In the original code (https://github.com/experiencor/keras-yolo3/blob/master/utils/utils.py at line 179), they apply softmax to all class probabilities: netout[..., 5:] = netout[..., 4][..., np.newaxis] * _softmax(netout[..., 5:])

In your code, (https://github.com/jkjung-avt/tensorrt_demos/blob/master/plugins/yolo_layer.cu at line 167), you post-process the class probabilities with a sigmoid: float max_cls_prob = sigmoidGPU(max_cls_logit);

In my case with only one classe:

with the original code: the softmax is always equal to one and the score associated with the bounding box is therefore equal to pobj*pclass0 = pobj.
with your implementation: the score of the bounding box is equal to pobj*pclass0 which is smaller

I think that this problem can explain why with more classes, the mAP is better (because the softmax is more similar to the sigmoid).

Thank you for your contribution with tensorrt_demos, Thomas

Apr 27 '22 10:04 ThomasGoud

@ThomasGoud Thanks for sharing your thoughts. But according to the original DarkNet implementation, the objectness and class scores are calculated by taking LOGISTIC (i.e. sigmoid) activation on the outputs of the previous convolutional layers.

You could refer to the source code as pointed below.

https://github.com/AlexeyAB/darknet/blob/8a0bf84c19e38214219dbd3345f04ce778426c57/src/yolo_layer.c#L680

https://github.com/AlexeyAB/darknet/blob/8a0bf84c19e38214219dbd3345f04ce778426c57/src/yolo_layer.c#L1190

Apr 27 '22 15:04 jkjung-avt

tensorrt_demos tensorrt_demos copied to clipboard

Drop in mAP after TensorRT optimization

tensorrt_demos
tensorrt_demos copied to clipboard