mmdetection-to-tensorrt error:batchedNMSPlugin.cpp

hi,I met the problem: #assertion/amirstan_plugin/src/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp,143 Aborted (core dumped)

enviroment:

OS: [Ubuntu]
python_version: [3.6]
pytorch_version: [1.6]
cuda_version: [cuda-10.2]
cudnn_version: [7.6.5]
mmdetection_version: [2.4]

Looking forward to your help~~ thankyou

Sep 22 '20 11:09 lexiqi

Hi, Could you provide the script,model file and test image data?

Sep 22 '20 11:09 grimoire

scrip:[https://github.com/grimoire/mmdetection-to-tensorrt/blob/master/demo/inference.py] model :[retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth（http://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_1x_coco/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth）] config:[retinanet_r50_fpn_1x_coco.py（https://github.com/open-mmlab/mmdetection/blob/master/configs/retinanet/retinanet_r50_fpn_1x_coco.py）] image： coco_person

Sep 22 '20 11:09 lexiqi

Hi
I have test the image you provided. Seems convertor works.

here is my test script:

python demo/inference.py \
   test.jpg \
   retinanet_r50_fpn_1x_coco.py \
   retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth \
   retina.pth

And result: Screenshot from 2020-09-22 20-10-44

Could you provide more detail about how to reproduce the error? Such as the gpu device type, the argument you send to the script or anything might related.

Sep 22 '20 12:09 grimoire

Hi I have test the image you provided. Seems convertor works.

here is my test script:
python demo/inference.py \
   test.jpg \
   retinanet_r50_fpn_1x_coco.py \
   retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth \
   retina.pth 
And result:

Could you provide more detail about how to reproduce the error? Such as the gpu device type, the argument you send to the script or anything might related.

hi,sorry bother you again!! Now,I have runned it ,but the results is wrong. like this: [tensor([[-1070399664]], device='cuda:0', dtype=torch.int32), tensor([[[ 26.3982, -17.1053, 81.1487, 17.1053], [ 19.2828, -21.5513, 88.2641, 21.5513], [ 38.4096, -19.2000, 69.1373, 19.2000], [ 34.4162, -24.1905, 73.1307, 24.1905], [ 29.3849, -30.4781, 78.1620, 30.4781], [ 42.9096, -27.1529, 64.6373, 27.1529], [ 40.0858, -34.2105, 67.4611, 34.2105], [ 36.5281, -43.1025, 71.0188, 43.1025], [ 39.7276, -13.5765, 83.1831, 13.5765], [ 34.0801, -17.1053, 88.8306, 17.1053], [ 26.9647, -21.5513, 95.9460, 21.5513], [ 46.0915, -19.2000, 76.8192, 19.2000], [ 42.0981, -24.1905, 80.8126, 24.1905], [ 37.0668, -30.4781, 85.8439, 30.4781], [ 50.5915, -27.1529, 72.3192, 27.1529], [ 47.7677, -34.2105, 75.1430, 34.2105], [ 44.2100, -43.1025, 78.7007, 43.1025], [ 47.4095, -13.5765, 90.8650, 13.5765], [ 41.7620, -17.1053, 96.5125, 17.1053], [ 34.6466, -21.5513, 103.6279, 21.5513], [ 53.7734, -19.2000, 84.5011, 19.2000], [ 49.7801, -24.1905, 88.4945, 24.1905], [ 44.7487, -30.4781, 93.5259, 30.4781], [ 58.2734, -27.1529, 80.0012, 27.1529], [ 55.4497, -34.2105, 82.8249, 34.2105], [ 51.8920, -43.1025, 86.3826, 43.1025], [ 55.0914, -13.5765, 98.5470, 13.5765], [ 49.4440, -17.1053, 104.1945, 17.1053], [ 42.3285, -21.5513, 111.3099, 21.5513], [ 61.4554, -19.2000, 92.1830, 19.2000], [ 57.4620, -24.1905, 96.1764, 24.1905], [ 52.4306, -30.4781, 101.2078, 30.4781], [ 65.9553, -27.1529, 87.6831, 27.1529], [ 63.1316, -34.2105, 90.5068, 34.2105], [ 59.5739, -43.1025, 94.0645, 43.1025], [ 62.7734, -13.5765, 106.2289, 13.5765], [ 57.1259, -17.1053, 111.8764, 17.1053], [ 50.0105, -21.5513, 118.9918, 21.5513], [ 69.1373, -19.2000, 99.8650, 19.2000], [ 65.1439, -24.1905, 103.8584, 24.1905], [ 60.1125, -30.4781, 108.8897, 30.4781], [ 73.6373, -27.1529, 95.3650, 27.1529], [ 70.8135, -34.2105, 98.1888, 34.2105], [ 67.2558, -43.1025, 101.7465, 43.1025], [ 70.4553, -13.5765, 113.9108, 13.5765], [ 64.8078, -17.1053, 119.5583, 17.1053], [ 57.6924, -21.5513, 126.6737, 21.5513], [ 76.8192, -19.2000, 107.5469, 19.2000], [ 72.8258, -24.1905, 111.5403, 24.1905], [ 67.7945, -30.4781, 116.5716, 30.4781], [ 81.3192, -27.1529, 103.0469, 27.1529], [ 78.4954, -34.2105, 105.8707, 34.2105], [ 74.9377, -43.1025, 109.4284, 43.1025], [ 78.1372, -13.5765, 121.5927, 13.5765], [ 72.4897, -17.1053, 127.2402, 17.1053], [ 65.3743, -21.5513, 134.3556, 21.5513], [ 84.5011, -19.2000, 115.2288, 19.2000], [ 80.5077, -24.1905, 119.2222, 24.1905], [ 75.4764, -30.4781, 124.2535, 30.4781], [ 89.0011, -27.1529, 110.7288, 27.1529], [ 86.1773, -34.2105, 113.5526, 34.2105], [ 82.6196, -43.1025, 117.1103, 43.1025], [ 85.8191, -13.5765, 129.2746, 13.5765], [ 80.1716, -17.1053, 134.9221, 17.1053], [ 73.0562, -21.5513, 142.0376, 21.5513], [ 92.1830, -19.2000, 122.9107, 19.2000], [ 88.1897, -24.1905, 126.9041, 24.1905], [ 83.1583, -30.4781, 131.9355, 30.4781], [ 96.6830, -27.1529, 118.4108, 27.1529], [ 93.8593, -34.2105, 121.2345, 34.2105], [ 90.3016, -43.1025, 124.7922, 43.1025], [ 93.5011, -13.5765, 136.9565, 13.5765], [ 87.8536, -17.1053, 142.6040, 17.1053], [ 80.7382, -21.5513, 149.7195, 21.5513], [ 99.8650, -19.2000, 130.5927, 19.2000], [ 95.8716, -24.1905, 134.5860, 24.1905], [ 90.8402, -30.4781, 139.6174, 30.4781], [104.3649, -27.1529, 126.0927, 27.1529], [101.5412, -34.2105, 128.9164, 34.2105], [ 97.9835, -43.1025, 132.4741, 43.1025], [101.1830, -13.5765, 144.6385, 13.5765], [ 95.5355, -17.1053, 150.2860, 17.1053], [ 88.4201, -21.5513, 157.4014, 21.5513], [107.5469, -19.2000, 138.2746, 19.2000], [103.5535, -24.1905, 142.2679, 24.1905], [ 98.5221, -30.4781, 147.2993, 30.4781], [112.0469, -27.1529, 133.7746, 27.1529], [109.2231, -34.2105, 136.5984, 34.2105], [105.6654, -43.1025, 140.1561, 43.1025], [108.8649, -13.5765, 152.3204, 13.5765], [103.2174, -17.1053, 157.9679, 17.1053], [ 96.1020, -21.5513, 165.0833, 21.5513], [115.2288, -19.2000, 145.9565, 19.2000], [111.2354, -24.1905, 149.9499, 24.1905], [106.2041, -30.4781, 154.9812, 30.4781], [119.7288, -27.1529, 141.4565, 27.1529], [116.9050, -34.2105, 144.2803, 34.2105], [113.3473, -43.1025, 147.8380, 43.1025], [116.5468, -13.5765, 160.0023, 13.5765], [110.8993, -17.1053, 165.6498, 17.1053]]], device='cuda:0'), tensor([[304.0000, -32.0000, 368.0000, 32.0000, 295.6825, -40.3175, 376.3175, 40.3175, 285.2032, -50.7968, 386.7968, 50.7968, 313.3726, -45.2548, 358.6274, 45.2548, 307.4912, -57.0175, 364.5088, 57.0175, 300.0812, -71.8376, 371.9188, 71.8376, 306.7452, -22.6274, 397.2548, 22.6274, 294.9825, -28.5088, 409.0175, 28.5088, 280.1624, -35.9188, 423.8376, 35.9188, 320.0000, -32.0000, 384.0000, 32.0000, 311.6825, -40.3175, 392.3175, 40.3175, 301.2032, -50.7968, 402.7968, 50.7968, 329.3726, -45.2548, 374.6274, 45.2548, 323.4912, -57.0175, 380.5088, 57.0175, 316.0812, -71.8376, 387.9188, 71.8376, 322.7452, -22.6274, 413.2548, 22.6274, 310.9825, -28.5088, 425.0175, 28.5088, 296.1624, -35.9188, 439.8376, 35.9188, 336.0000, -32.0000, 400.0000, 32.0000, 327.6825, -40.3175, 408.3175, 40.3175, 317.2032, -50.7968, 418.7968, 50.7968, 345.3726, -45.2548, 390.6274, 45.2548, 339.4912, -57.0175, 396.5088, 57.0175, 332.0812, -71.8376, 403.9188, 71.8376, 338.7452, -22.6274, 429.2548, 22.6274]], device='cuda:0'), tensor([[348.0812, -71.8376, 419.9188, 71.8376, 354.7452, -22.6274, 445.2548, 22.6274, 342.9825, -28.5088, 457.0175, 28.5088, 328.1624, -35.9188, 471.8376, 35.9188, 368.0000, -32.0000, 432.0000, 32.0000, 359.6825, -40.3175, 440.3175, 40.3175, 349.2032, -50.7968, 450.7968, 50.7968, 377.3726, -45.2548, 422.6274, 45.2548, 371.4912, -57.0175, 428.5088, 57.0175, 364.0812, -71.8376, 435.9188, 71.8376, 370.7452, -22.6274, 461.2548, 22.6274, 358.9825, -28.5088, 473.0175, 28.5088, 344.1624, -35.9188, 487.8376, 35.9188, 384.0000, -32.0000, 448.0000, 32.0000, 375.6825, -40.3175, 456.3175, 40.3175, 365.2032, -50.7968, 466.7968, 50.7968, 393.3726, -45.2548, 438.6274, 45.2548, 387.4912, -57.0175, 444.5088, 57.0175, 380.0812, -71.8376, 451.9188, 71.8376, 386.7452, -22.6274, 477.2548, 22.6274, 374.9825, -28.5088, 489.0175, 28.5088, 360.1624, -35.9188, 503.8376, 35.9188, 400.0000, -32.0000, 464.0000, 32.0000, 391.6825, -40.3175, 472.3175, 40.3175, 381.2032, -50.7968, 482.7968, 50.7968]], device='cuda:0')]

I debug into the torch2trt_dynamic.py, It likes that the " self.context.execute_async_v2(bindings, torch.cuda.current_stream().cuda_stream)" doesn't work? Could you give some advices?? thank you soooo much!!

Oct 14 '20 10:10 lexiqi

and..., emm....,is it convenient to provide a dockerfile???

Oct 14 '20 10:10 lexiqi

execute_async_v2 is the inference entry of tensorrt. The error is happening inside the model.

The project has been changed a lot since my last reply, please reinstall torch2trt_dynamic, amirstan_plugin, mmdetection-to-tensorrt and try again.

If the error still exist. you can try create tensorrt model and wrap model(pytorch) like below, see if their result is different or not.

    trt_model, wrap_model = mmdet2trt(cfg_path, 
                                    model_path,
                                    opt_shape_param=opt_shape_param, 
                                    max_workspace_size=1<<32,
                                    trt_log_level="INFO",
                                    return_wrap_model=True,
                                    output_names=None)

modify anchor_head.py (assuming you are using retinanet, right?), address the layer which give you different results. I will see if I can do something.

Dockfile is on my TODO list, will be added in future.

Oct 14 '20 12:10 grimoire

I will see if I can do something.

ok，I will try again~~

Oct 14 '20 12:10 lexiqi

mmdetection-to-tensorrt mmdetection-to-tensorrt copied to clipboard

error:batchedNMSPlugin.cpp

mmdetection-to-tensorrt
mmdetection-to-tensorrt copied to clipboard