mmdetection-to-tensorrt icon indicating copy to clipboard operation
mmdetection-to-tensorrt copied to clipboard

error:batchedNMSPlugin.cpp

Open lexiqi opened this issue 5 years ago • 7 comments

hi,I met the problem: #assertion/amirstan_plugin/src/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp,143 Aborted (core dumped)

image

enviroment:

  • OS: [Ubuntu]
  • python_version: [3.6]
  • pytorch_version: [1.6]
  • cuda_version: [cuda-10.2]
  • cudnn_version: [7.6.5]
  • mmdetection_version: [2.4]

Looking forward to your help~~ thankyou

lexiqi avatar Sep 22 '20 11:09 lexiqi

Hi, Could you provide the script,model file and test image data?

grimoire avatar Sep 22 '20 11:09 grimoire

scrip:[https://github.com/grimoire/mmdetection-to-tensorrt/blob/master/demo/inference.py] model :[retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth(http://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_1x_coco/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth)] config:[retinanet_r50_fpn_1x_coco.py(https://github.com/open-mmlab/mmdetection/blob/master/configs/retinanet/retinanet_r50_fpn_1x_coco.py)] image: coco_person

lexiqi avatar Sep 22 '20 11:09 lexiqi

Hi
I have test the image you provided. Seems convertor works.

here is my test script:

python demo/inference.py \
   test.jpg \
   retinanet_r50_fpn_1x_coco.py \
   retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth \
   retina.pth 

And result: Screenshot from 2020-09-22 20-10-44

Could you provide more detail about how to reproduce the error? Such as the gpu device type, the argument you send to the script or anything might related.

grimoire avatar Sep 22 '20 12:09 grimoire

Hi I have test the image you provided. Seems convertor works.

here is my test script:

python demo/inference.py \
   test.jpg \
   retinanet_r50_fpn_1x_coco.py \
   retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth \
   retina.pth 

And result: Screenshot from 2020-09-22 20-10-44

Could you provide more detail about how to reproduce the error? Such as the gpu device type, the argument you send to the script or anything might related.

hi,sorry bother you again!! Now,I have runned it ,but the results is wrong. like this: [tensor([[-1070399664]], device='cuda:0', dtype=torch.int32), tensor([[[ 26.3982, -17.1053, 81.1487, 17.1053], [ 19.2828, -21.5513, 88.2641, 21.5513], [ 38.4096, -19.2000, 69.1373, 19.2000], [ 34.4162, -24.1905, 73.1307, 24.1905], [ 29.3849, -30.4781, 78.1620, 30.4781], [ 42.9096, -27.1529, 64.6373, 27.1529], [ 40.0858, -34.2105, 67.4611, 34.2105], [ 36.5281, -43.1025, 71.0188, 43.1025], [ 39.7276, -13.5765, 83.1831, 13.5765], [ 34.0801, -17.1053, 88.8306, 17.1053], [ 26.9647, -21.5513, 95.9460, 21.5513], [ 46.0915, -19.2000, 76.8192, 19.2000], [ 42.0981, -24.1905, 80.8126, 24.1905], [ 37.0668, -30.4781, 85.8439, 30.4781], [ 50.5915, -27.1529, 72.3192, 27.1529], [ 47.7677, -34.2105, 75.1430, 34.2105], [ 44.2100, -43.1025, 78.7007, 43.1025], [ 47.4095, -13.5765, 90.8650, 13.5765], [ 41.7620, -17.1053, 96.5125, 17.1053], [ 34.6466, -21.5513, 103.6279, 21.5513], [ 53.7734, -19.2000, 84.5011, 19.2000], [ 49.7801, -24.1905, 88.4945, 24.1905], [ 44.7487, -30.4781, 93.5259, 30.4781], [ 58.2734, -27.1529, 80.0012, 27.1529], [ 55.4497, -34.2105, 82.8249, 34.2105], [ 51.8920, -43.1025, 86.3826, 43.1025], [ 55.0914, -13.5765, 98.5470, 13.5765], [ 49.4440, -17.1053, 104.1945, 17.1053], [ 42.3285, -21.5513, 111.3099, 21.5513], [ 61.4554, -19.2000, 92.1830, 19.2000], [ 57.4620, -24.1905, 96.1764, 24.1905], [ 52.4306, -30.4781, 101.2078, 30.4781], [ 65.9553, -27.1529, 87.6831, 27.1529], [ 63.1316, -34.2105, 90.5068, 34.2105], [ 59.5739, -43.1025, 94.0645, 43.1025], [ 62.7734, -13.5765, 106.2289, 13.5765], [ 57.1259, -17.1053, 111.8764, 17.1053], [ 50.0105, -21.5513, 118.9918, 21.5513], [ 69.1373, -19.2000, 99.8650, 19.2000], [ 65.1439, -24.1905, 103.8584, 24.1905], [ 60.1125, -30.4781, 108.8897, 30.4781], [ 73.6373, -27.1529, 95.3650, 27.1529], [ 70.8135, -34.2105, 98.1888, 34.2105], [ 67.2558, -43.1025, 101.7465, 43.1025], [ 70.4553, -13.5765, 113.9108, 13.5765], [ 64.8078, -17.1053, 119.5583, 17.1053], [ 57.6924, -21.5513, 126.6737, 21.5513], [ 76.8192, -19.2000, 107.5469, 19.2000], [ 72.8258, -24.1905, 111.5403, 24.1905], [ 67.7945, -30.4781, 116.5716, 30.4781], [ 81.3192, -27.1529, 103.0469, 27.1529], [ 78.4954, -34.2105, 105.8707, 34.2105], [ 74.9377, -43.1025, 109.4284, 43.1025], [ 78.1372, -13.5765, 121.5927, 13.5765], [ 72.4897, -17.1053, 127.2402, 17.1053], [ 65.3743, -21.5513, 134.3556, 21.5513], [ 84.5011, -19.2000, 115.2288, 19.2000], [ 80.5077, -24.1905, 119.2222, 24.1905], [ 75.4764, -30.4781, 124.2535, 30.4781], [ 89.0011, -27.1529, 110.7288, 27.1529], [ 86.1773, -34.2105, 113.5526, 34.2105], [ 82.6196, -43.1025, 117.1103, 43.1025], [ 85.8191, -13.5765, 129.2746, 13.5765], [ 80.1716, -17.1053, 134.9221, 17.1053], [ 73.0562, -21.5513, 142.0376, 21.5513], [ 92.1830, -19.2000, 122.9107, 19.2000], [ 88.1897, -24.1905, 126.9041, 24.1905], [ 83.1583, -30.4781, 131.9355, 30.4781], [ 96.6830, -27.1529, 118.4108, 27.1529], [ 93.8593, -34.2105, 121.2345, 34.2105], [ 90.3016, -43.1025, 124.7922, 43.1025], [ 93.5011, -13.5765, 136.9565, 13.5765], [ 87.8536, -17.1053, 142.6040, 17.1053], [ 80.7382, -21.5513, 149.7195, 21.5513], [ 99.8650, -19.2000, 130.5927, 19.2000], [ 95.8716, -24.1905, 134.5860, 24.1905], [ 90.8402, -30.4781, 139.6174, 30.4781], [104.3649, -27.1529, 126.0927, 27.1529], [101.5412, -34.2105, 128.9164, 34.2105], [ 97.9835, -43.1025, 132.4741, 43.1025], [101.1830, -13.5765, 144.6385, 13.5765], [ 95.5355, -17.1053, 150.2860, 17.1053], [ 88.4201, -21.5513, 157.4014, 21.5513], [107.5469, -19.2000, 138.2746, 19.2000], [103.5535, -24.1905, 142.2679, 24.1905], [ 98.5221, -30.4781, 147.2993, 30.4781], [112.0469, -27.1529, 133.7746, 27.1529], [109.2231, -34.2105, 136.5984, 34.2105], [105.6654, -43.1025, 140.1561, 43.1025], [108.8649, -13.5765, 152.3204, 13.5765], [103.2174, -17.1053, 157.9679, 17.1053], [ 96.1020, -21.5513, 165.0833, 21.5513], [115.2288, -19.2000, 145.9565, 19.2000], [111.2354, -24.1905, 149.9499, 24.1905], [106.2041, -30.4781, 154.9812, 30.4781], [119.7288, -27.1529, 141.4565, 27.1529], [116.9050, -34.2105, 144.2803, 34.2105], [113.3473, -43.1025, 147.8380, 43.1025], [116.5468, -13.5765, 160.0023, 13.5765], [110.8993, -17.1053, 165.6498, 17.1053]]], device='cuda:0'), tensor([[304.0000, -32.0000, 368.0000, 32.0000, 295.6825, -40.3175, 376.3175, 40.3175, 285.2032, -50.7968, 386.7968, 50.7968, 313.3726, -45.2548, 358.6274, 45.2548, 307.4912, -57.0175, 364.5088, 57.0175, 300.0812, -71.8376, 371.9188, 71.8376, 306.7452, -22.6274, 397.2548, 22.6274, 294.9825, -28.5088, 409.0175, 28.5088, 280.1624, -35.9188, 423.8376, 35.9188, 320.0000, -32.0000, 384.0000, 32.0000, 311.6825, -40.3175, 392.3175, 40.3175, 301.2032, -50.7968, 402.7968, 50.7968, 329.3726, -45.2548, 374.6274, 45.2548, 323.4912, -57.0175, 380.5088, 57.0175, 316.0812, -71.8376, 387.9188, 71.8376, 322.7452, -22.6274, 413.2548, 22.6274, 310.9825, -28.5088, 425.0175, 28.5088, 296.1624, -35.9188, 439.8376, 35.9188, 336.0000, -32.0000, 400.0000, 32.0000, 327.6825, -40.3175, 408.3175, 40.3175, 317.2032, -50.7968, 418.7968, 50.7968, 345.3726, -45.2548, 390.6274, 45.2548, 339.4912, -57.0175, 396.5088, 57.0175, 332.0812, -71.8376, 403.9188, 71.8376, 338.7452, -22.6274, 429.2548, 22.6274]], device='cuda:0'), tensor([[348.0812, -71.8376, 419.9188, 71.8376, 354.7452, -22.6274, 445.2548, 22.6274, 342.9825, -28.5088, 457.0175, 28.5088, 328.1624, -35.9188, 471.8376, 35.9188, 368.0000, -32.0000, 432.0000, 32.0000, 359.6825, -40.3175, 440.3175, 40.3175, 349.2032, -50.7968, 450.7968, 50.7968, 377.3726, -45.2548, 422.6274, 45.2548, 371.4912, -57.0175, 428.5088, 57.0175, 364.0812, -71.8376, 435.9188, 71.8376, 370.7452, -22.6274, 461.2548, 22.6274, 358.9825, -28.5088, 473.0175, 28.5088, 344.1624, -35.9188, 487.8376, 35.9188, 384.0000, -32.0000, 448.0000, 32.0000, 375.6825, -40.3175, 456.3175, 40.3175, 365.2032, -50.7968, 466.7968, 50.7968, 393.3726, -45.2548, 438.6274, 45.2548, 387.4912, -57.0175, 444.5088, 57.0175, 380.0812, -71.8376, 451.9188, 71.8376, 386.7452, -22.6274, 477.2548, 22.6274, 374.9825, -28.5088, 489.0175, 28.5088, 360.1624, -35.9188, 503.8376, 35.9188, 400.0000, -32.0000, 464.0000, 32.0000, 391.6825, -40.3175, 472.3175, 40.3175, 381.2032, -50.7968, 482.7968, 50.7968]], device='cuda:0')]

I debug into the torch2trt_dynamic.py, It likes that the " self.context.execute_async_v2(bindings, torch.cuda.current_stream().cuda_stream)" doesn't work? Could you give some advices?? thank you soooo much!!

image

lexiqi avatar Oct 14 '20 10:10 lexiqi

and..., emm....,is it convenient to provide a dockerfile???

lexiqi avatar Oct 14 '20 10:10 lexiqi

execute_async_v2 is the inference entry of tensorrt. The error is happening inside the model.

The project has been changed a lot since my last reply, please reinstall torch2trt_dynamic, amirstan_plugin, mmdetection-to-tensorrt and try again.

If the error still exist. you can try create tensorrt model and wrap model(pytorch) like below, see if their result is different or not.

    trt_model, wrap_model = mmdet2trt(cfg_path, 
                                    model_path,
                                    opt_shape_param=opt_shape_param, 
                                    max_workspace_size=1<<32,
                                    trt_log_level="INFO",
                                    return_wrap_model=True,
                                    output_names=None)

modify anchor_head.py (assuming you are using retinanet, right?), address the layer which give you different results. I will see if I can do something.

Dockfile is on my TODO list, will be added in future.

grimoire avatar Oct 14 '20 12:10 grimoire

I will see if I can do something.

ok,I will try again~~

lexiqi avatar Oct 14 '20 12:10 lexiqi