mmdetection-to-tensorrt Unable to use converted YOLOv3 model

Describe the bug I'm trying to convert trained model based on yolov3 from mmdet in order to use in NVIDIA Triton inference server. Conversion using mmdet2trt finished successfully, but when I try to use model using inference_detector it throws exception

WARNING:root:module mmdet.models.dense_heads.TransformerHead not exist.
Use load_from_local loader
[TensorRT] INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[TensorRT] INFO: Detected 1 inputs and 4 output network tensors.
[TensorRT] ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::1137, condition: profileMinDims.d[i] <= dimensions.d[i]
[TensorRT] ERROR: Parameter check failed at: engine.cpp::resolveSlots::1318, condition: allInputDimensionsSpecified(routine)
Traceback (most recent call last):
  File "converter/mmdetection-to-tensorrt/demo/inference.py", line 63, in <module>
    main()
  File "converter/mmdetection-to-tensorrt/demo/inference.py", line 33, in main
    result = inference_detector(trt_model, image_path, cfg_path, args.device)
  File "/workspace/converter/mmdetection-to-tensorrt/mmdet2trt/apis/inference.py", line 48, in inference_detector
    result = model(tensor)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/workspace/converter/torch2trt_dynamic/torch2trt_dynamic/torch2trt_dynamic.py", line 478, in forward
    shape = tuple(self.context.get_binding_shape(idx))
ValueError: __len__() should return >= 0

mmdetection-to-tensorrt/demo/inference.py fails with same error message

To Reproduce

Download model checkpoint and config from here
Run

python converter/mmdetection-to-tensorrt/demo/inference.py \
    test.jpg \
    yolo_cropper.py \
    yolo_cropper.pth \
    yolo_cropper.trt.pth

enviroment:

Host OS: Manjaro Linux
Dev Container: based from Docker image nvcr.io/nvidia/tensorrt:21.06-py3
python_version: 3.8.5
pytorch_version: 1.8.1+cu111
cuda_version: 11.3.1
cudnn_version: 8.2.1
mmdetection_version: 2.12.0

Additional context Add any other context about the problem here.

Jul 09 '21 09:07 minmaxmean

You might need to set different opt_shape_param when you convert your model. since the default config is for the two-stage model or retinanet-like model. read this for detail.

Jul 09 '21 09:07 grimoire

Thank you, I think changing min_shape solved that problem. But now, I have problem that when I run model in Triton, it fails with message

I0709 10:05:41.195130 1 plan_backend.cc:2513] Running yolo_cropper_0_gpu0 with 1 requests
I0709 10:05:41.195174 1 plan_backend.cc:3431] Optimization profile default [0] is selected for yolo_cropper_0_gpu0
I0709 10:05:41.195216 1 pinned_memory_manager.cc:161] pinned memory allocation: size 3326976, addr 0x7f378a000090
I0709 10:05:41.195995 1 plan_backend.cc:2936] Context with profile default [0] is being executed for yolo_cropper_0_gpu0
E0709 10:05:41.196090 1 logging.cc:43] (Unnamed Layer* 214) [Concatenation]: dimensions not compatible for concatenation
E0709 10:05:41.196099 1 logging.cc:43] shapeMachine.cpp (276) - Shape Error in operator(): condition '==' violated
E0709 10:05:41.196104 1 logging.cc:43] Instruction: CHECK_EQUAL 30 29

Do you know, to what concatenation it could be referring to? Input I pass is shaped (1, 3, 608, 456)

Jul 09 '21 10:07 minmaxmean

Errr, I do not have much experience with Triton. According to the logs, It seems like the input tensor shape is not 32 multiples(this limit comes from mmdet). Please check if the preprocess is the same as mmdetection.

Jul 09 '21 10:07 grimoire

Thank you for direction. It was seems like it was indeed padding problem, interestingly this problem did not occur with Faster-RCNN model.

However, now I have problem that triton just exists just with this line printed

#assertion/workspace/converter/amirstan_plugin/src/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp,132

https://github.com/grimoire/amirstan_plugin/blob/ca8d16fadbf169edcf27541d4044fc2115544998/src/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp#L132

Which I believe is related to amirstan_plugin, NVM layer in particular. Is there any way of enabling logging for that plugin in order to debug this situation?

Jul 09 '21 11:07 minmaxmean

I just simply add print in code and build again.

Jul 09 '21 14:07 grimoire