mmdeploy
mmdeploy copied to clipboard
[BUG] TensorRT optimised model is detecting less objects compared to pytorch model, most likely some difference in post processing.
Checklist
- [X] I have searched related issues but cannot get the expected help.
- [X] 2. I have read the FAQ documentation but cannot get the expected help.
- [X] 3. The bug has not been fixed in the latest version.
Describe the bug
So I have managed to train one model on non square input sizes - height-1216, width - 1920. I optimised this model using mmdeploy and converted the model to tensorrt with FP16 precision using the tools/deploy.py script. However, when visualising the sample result, there are less number of objects detected by the TensorRT model as compared to PyTorch model. I believe this is not a problem with optimisation or quantisation, as the objects that have been correctly detected by TensorRT model have the exact same location and confidence as PyTorch model. Moreover, the TensorRT model is only missing objects in places where the objects are closely and densely located, which leads me to believe that there is discrepancy with the post processing pipeline. Please help me in identifying the problem and fixing this. I'm attaching all the config files below for your reference.
Model config file
default_scope = 'mmdet'
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=50),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=10),
sampler_seed=dict(type='DistSamplerSeedHook'),
visualization=dict(type='DetVisualizationHook'))
env_cfg = dict(
cudnn_benchmark=False,
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
dist_cfg=dict(backend='nccl'))
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
type='DetLocalVisualizer',
vis_backends=[dict(type='LocalVisBackend')],
name='visualizer')
log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True)
log_level = 'INFO'
load_from = '/media/chetan/Project/Projects/rtmdet_train/mmdetection/work_dirs/config_corrected_det/epoch_160.pth'
resume = True
train_cfg = dict(
type='EpochBasedTrainLoop',
max_epochs=300,
val_interval=1,
dynamic_intervals=[(80, 1)])
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
param_scheduler = [
dict(
type='LinearLR', start_factor=1e-05, by_epoch=False, begin=0,
end=1000),
dict(
type='CosineAnnealingLR',
eta_min=0.0002,
begin=150,
end=300,
T_max=100,
by_epoch=True,
convert_to_iter_based=True)
]
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='AdamW', lr=0.001, weight_decay=0.05),
paramwise_cfg=dict(
norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True))
auto_scale_lr = dict(enable=False, base_batch_size=96)
dataset_type = 'CocoDataset'
data_root = '/home/chetan/Desktop/rtmdet_training/coco_finetuning_data'
backend_args = None
train_pipeline = [
dict(type='LoadImageFromFile', backend_args=None),
dict(
type='LoadAnnotations',
with_bbox=True,
with_mask=False,
poly2mask=False),
dict(type='CachedMosaic', img_scale=(1920, 1216), pad_val=114.0, prob=0.2),
dict(
type='RandomResize',
scale=(1920, 1216),
ratio_range=(0.8, 1.2),
keep_ratio=True,
prob=0.1),
dict(
type='RandomCrop',
crop_size=(1920, 1216),
recompute_bbox=True,
allow_negative_crop=True,
prob=0.1),
dict(type='YOLOXHSVRandomAug', prob=0.1),
dict(type='RandomFlip', prob=0.5),
dict(type='Pad', size=(1920, 1216), pad_val=dict(img=(114, 114, 114))),
dict(
type='CachedMixUp',
img_scale=(1920, 1216),
ratio_range=(1.0, 1.0),
max_cached_images=20,
prob=0.1,
pad_val=(114, 114, 114)),
dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)),
dict(type='PackDetInputs')
]
test_pipeline = [
dict(type='LoadImageFromFile', backend_args=None),
dict(type='Resize', scale=(1920, 1216), keep_ratio=True),
dict(
type='PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
train_dataloader = dict(
batch_size=96,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
batch_sampler=None,
dataset=dict(
type='CocoDataset',
data_root='/home/chetan/Desktop/rtmdet_training/coco_finetuning_data',
ann_file='train/coco_annotations.json',
data_prefix=dict(img='train/'),
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=[
dict(type='LoadImageFromFile', backend_args=None),
dict(
type='LoadAnnotations',
with_bbox=True,
with_mask=False,
poly2mask=False),
dict(
type='CachedMosaic',
img_scale=(1920, 1216),
pad_val=114.0,
prob=0.2),
dict(
type='RandomResize',
scale=(1920, 1216),
ratio_range=(0.8, 1.2),
keep_ratio=True),
dict(
type='RandomCrop',
crop_size=(1920, 1216),
recompute_bbox=True,
allow_negative_crop=True),
dict(type='YOLOXHSVRandomAug'),
dict(type='RandomFlip', prob=0.5),
dict(
type='Pad', size=(1920, 1216),
pad_val=dict(img=(114, 114, 114))),
dict(
type='CachedMixUp',
img_scale=(1920, 1216),
ratio_range=(1.0, 1.0),
max_cached_images=20,
pad_val=(114, 114, 114),
prob=0.2),
dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)),
dict(type='PackDetInputs')
],
backend_args=None,
metainfo=dict(
classes=('Neoplastic', 'Inflammatory', 'Stroma',
'Necrosis/Dead Cells', 'Normal Epithelial'),
palette=[(255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0),
(0, 255, 255)])),
pin_memory=True)
val_dataloader = dict(
batch_size=32,
num_workers=8,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type='CocoDataset',
data_root='/home/chetan/Desktop/rtmdet_training/coco_finetuning_data',
ann_file='val/coco_annotations.json',
data_prefix=dict(img='val/'),
test_mode=True,
pipeline=[
dict(type='LoadImageFromFile', backend_args=None),
dict(type='Resize', scale=(1920, 1216), keep_ratio=True),
dict(
type='Pad', size=(1920, 1216),
pad_val=dict(img=(114, 114, 114))),
dict(
type='PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
],
backend_args=None,
metainfo=dict(
classes=('Neoplastic', 'Inflammatory', 'Stroma',
'Necrosis/Dead Cells', 'Normal Epithelial'),
palette=[(255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0),
(0, 255, 255)])))
test_dataloader = dict(
batch_size=64,
num_workers=8,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type='CocoDataset',
data_root='/home/chetan/Desktop/rtmdet_training/coco_finetuning_data',
ann_file='val/coco_annotations.json',
data_prefix=dict(img='val/'),
test_mode=True,
pipeline=[
dict(type='LoadImageFromFile', backend_args=None),
dict(type='Resize', scale=(1920, 1216), keep_ratio=True),
dict(
type='Pad', size=(1920, 1216),
pad_val=dict(img=(114, 114, 114))),
dict(
type='PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
],
backend_args=None,
metainfo=dict(
classes=('Neoplastic', 'Inflammatory', 'Stroma',
'Necrosis/Dead Cells', 'Normal Epithelial'),
palette=[(255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0),
(0, 255, 255)])))
val_evaluator = dict(
type='CocoMetric',
ann_file=
'/home/chetan/Desktop/rtmdet_training/coco_finetuning_data/val/coco_annotations.json',
metric='bbox',
format_only=False,
backend_args=None,
proposal_nums=(3000, 1, 10))
test_evaluator = dict(
type='CocoMetric',
ann_file=
'/home/chetan/Desktop/rtmdet_training/coco_finetuning_data/val/coco_annotations.json',
metric='bbox',
format_only=False,
backend_args=None,
proposal_nums=(3000, 1, 10))
tta_model = dict(
type='DetTTAModel',
tta_cfg=dict(nms=dict(type='nms', iou_threshold=0.6), max_per_img=3000))
img_scales = [(1920, 1216), (256, 256)]
tta_pipeline = [
dict(type='LoadImageFromFile', backend_args=None),
dict(
type='TestTimeAug',
transforms=[[{
'type': 'Resize',
'scale': (1920, 1216),
'keep_ratio': True,
'prob': 0.0
}, {
'type': 'Resize',
'scale': (144, 128),
'keep_ratio': True,
'prob': 0.0
}, {
'type': 'Resize',
'scale': (576, 512),
'keep_ratio': True,
'prob': 0.0
}],
[{
'type': 'RandomFlip',
'prob': 0.0
}, {
'type': 'RandomFlip',
'prob': 0.0
}],
[{
'type': 'Pad',
'size': (1920, 1216),
'pad_val': {
'img': (114, 114, 114)
},
'prob': 0.0
}],
[{
'type':
'PackDetInputs',
'meta_keys':
('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor', 'flip', 'flip_direction')
}]])
]
model = dict(
type='RTMDet',
data_preprocessor=dict(
type='DetDataPreprocessor',
mean=[179.92, 149.48, 198.26],
std=[14.06, 11.88, 11.06],
bgr_to_rgb=True,
batch_augments=None),
backbone=dict(
type='CSPNeXt',
arch='P5',
expand_ratio=0.5,
deepen_factor=0.67,
widen_factor=0.75,
channel_attention=True,
norm_cfg=dict(type='SyncBN'),
act_cfg=dict(type='SiLU', inplace=True)),
neck=dict(
type='CSPNeXtPAFPN',
in_channels=[192, 384, 768],
out_channels=192,
num_csp_blocks=2,
expand_ratio=0.5,
norm_cfg=dict(type='SyncBN'),
act_cfg=dict(type='SiLU', inplace=True)),
bbox_head=dict(
type='RTMDetSepBNHead',
num_classes=5,
in_channels=192,
stacked_convs=2,
feat_channels=192,
anchor_generator=dict(
type='MlvlPointGenerator', offset=0, strides=[8, 16, 32]),
bbox_coder=dict(type='DistancePointBBoxCoder'),
loss_cls=dict(
type='QualityFocalLoss',
use_sigmoid=True,
beta=2.0,
loss_weight=1.0),
loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
with_objectness=False,
exp_on_reg=True,
share_conv=True,
pred_kernel_size=1,
norm_cfg=dict(type='SyncBN'),
act_cfg=dict(type='SiLU', inplace=True)),
train_cfg=dict(
assigner=dict(type='DynamicSoftLabelAssigner', topk=13),
allowed_border=-1,
pos_weight=-1,
debug=False),
test_cfg=dict(
nms_pre=30000,
min_bbox_size=0,
score_thr=0.001,
nms=dict(type='nms', iou_threshold=0.6),
max_per_img=3000))
train_pipeline_stage2 = [
dict(type='LoadImageFromFile', backend_args=None),
dict(
type='LoadAnnotations',
with_bbox=True,
with_mask=False,
poly2mask=False),
dict(
type='RandomResize',
scale=(1920, 1216),
ratio_range=(0.8, 1.2),
keep_ratio=True),
dict(
type='RandomCrop',
crop_size=(1920, 1216),
recompute_bbox=True,
allow_negative_crop=True),
dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)),
dict(type='YOLOXHSVRandomAug'),
dict(type='RandomFlip', prob=0.5),
dict(type='Pad', size=(1920, 1216), pad_val=dict(img=(114, 114, 114))),
dict(type='PackDetInputs')
]
max_epochs = 300
stage2_num_epochs = 20
base_lr = 0.001
interval = 10
custom_hooks = [
dict(
type='EMAHook',
ema_type='ExpMomentumEMA',
momentum=0.0002,
update_buffers=True,
priority=49),
dict(
type='PipelineSwitchHook',
switch_epoch=280,
switch_pipeline=[
dict(type='LoadImageFromFile', backend_args=None),
dict(
type='LoadAnnotations',
with_bbox=True,
with_mask=False,
poly2mask=False),
dict(
type='RandomResize',
scale=(1920, 1216),
ratio_range=(0.8, 1.2),
keep_ratio=True),
dict(
type='RandomCrop',
crop_size=(1920, 1216),
recompute_bbox=True,
allow_negative_crop=True),
dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)),
dict(type='YOLOXHSVRandomAug', prob=0.1),
dict(type='RandomFlip', prob=0.5),
dict(
type='Pad', size=(1920, 1216),
pad_val=dict(img=(114, 114, 114))),
dict(type='PackDetInputs')
])
]
metainfo = dict(
classes=('Neoplastic', 'Inflammatory', 'Stroma', 'Necrosis/Dead Cells',
'Normal Epithelial'),
palette=[(255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0),
(0, 255, 255)])
launcher = 'none'
work_dir = './work_dirs/config_corrected_det_finetune'
The config file base_static.py
_base_ = ['../../_base_/onnx_config.py']
onnx_config = dict(output_names=['dets', 'labels'], input_shape=None)
codebase_config = dict(
type='mmdet',
task='ObjectDetection',
model_type='end2end',
post_processing=dict(
score_threshold=0.05,
confidence_threshold=0.005, # for YOLOv3
iou_threshold=0.6,
max_output_boxes_per_class=3000,
pre_top_k=5000,
keep_top_k=3000,
background_label_id=-1,
))
The tensorrt static optimisation
_base_ = ['./base_static.py', '../../_base_/backends/tensorrt.py']
onnx_config = dict(input_shape=(1920, 1216))
backend_config = dict(
common_config=dict(max_workspace_size=1 << 30),
model_inputs=[
dict(
input_shapes=dict(
input=dict(
min_shape=[1, 3, 1216, 1920],
opt_shape=[1, 3, 1216, 1920],
max_shape=[1, 3, 1216, 1920])))
])
Below are the detail.json, pipeline.json and deploy.json
deploy.json
{
"version": "1.0.0",
"task": "Detector",
"models": [
{
"name": "rtmdet",
"net": "end2end.engine",
"weights": "",
"backend": "tensorrt",
"precision": "FP16",
"batch_size": 1,
"dynamic_shape": false
}
],
"customs": []
}
detail.json
{
"version": "1.0.0",
"codebase": {
"task": "ObjectDetection",
"codebase": "mmdet",
"version": "3.0.0",
"pth": "/root/workspace/data/finetune_checkpoint_static/epoch_240.pth",
"config": "/root/workspace/data/finetune_checkpoint_static/config_corrected_det_finetune.py"
},
"codebase_config": {
"type": "mmdet",
"task": "ObjectDetection",
"model_type": "end2end",
"post_processing": {
"score_threshold": 0.05,
"confidence_threshold": 0.005,
"iou_threshold": 0.6,
"max_output_boxes_per_class": 3000,
"pre_top_k": 5000,
"keep_top_k": 3000,
"background_label_id": -1
}
},
"onnx_config": {
"type": "onnx",
"export_params": true,
"keep_initializers_as_inputs": false,
"opset_version": 11,
"save_file": "end2end.onnx",
"input_names": [
"input"
],
"output_names": [
"dets",
"labels"
],
"input_shape": [
1920,
1216
],
"optimize": true
},
"backend_config": {
"type": "tensorrt",
"common_config": {
"fp16_mode": true,
"max_workspace_size": 1073741824
},
"model_inputs": [
{
"input_shapes": {
"input": {
"min_shape": [
1,
3,
1216,
1920
],
"opt_shape": [
1,
3,
1216,
1920
],
"max_shape": [
1,
3,
1216,
1920
]
}
}
}
]
},
"calib_config": {}
}
And finally the pipeline.json
{
"pipeline": {
"input": [
"img"
],
"output": [
"post_output"
],
"tasks": [
{
"type": "Task",
"module": "Transform",
"name": "Preprocess",
"input": [
"img"
],
"output": [
"prep_output"
],
"transforms": [
{
"type": "LoadImageFromFile",
"backend_args": null
},
{
"type": "Resize",
"keep_ratio": false,
"size": [
1920,
1216
]
},
{
"type": "Normalize",
"to_rgb": true,
"mean": [
179.92,
149.48,
198.26
],
"std": [
14.06,
11.88,
11.06
]
},
{
"type": "Pad",
"size_divisor": 1
},
{
"type": "DefaultFormatBundle"
},
{
"type": "Collect",
"meta_keys": [
"flip",
"img_shape",
"scale_factor",
"flip_direction",
"filename",
"img_path",
"img_id",
"img_norm_cfg",
"valid_ratio",
"pad_param",
"pad_shape",
"ori_filename",
"ori_shape"
],
"keys": [
"img"
]
}
]
},
{
"name": "rtmdet",
"type": "Task",
"module": "Net",
"is_batched": false,
"input": [
"prep_output"
],
"output": [
"infer_output"
],
"input_map": {
"img": "input"
},
"output_map": {}
},
{
"type": "Task",
"module": "mmdet",
"name": "postprocess",
"component": "ResizeBBox",
"params": {
"nms_pre": 30000,
"min_bbox_size": 0,
"score_thr": 0.001,
"nms": {
"type": "nms",
"iou_threshold": 0.6
},
"max_per_img": 3000
},
"output": [
"post_output"
],
"input": [
"prep_output",
"infer_output"
]
}
]
}
}
Reproduction
python /root/workspace/mmdeploy/tools/deploy.py \
/root/workspace/mmdeploy/configs/mmdet/detection/detection_tensorrt-fp16_static-AOI.py \
/root/workspace/data/finetune_checkpoint_static/config_corrected_det_finetune.py \
/root/workspace/data/finetune_checkpoint_static/epoch_240.pth \
/root/workspace/data/1105.png \
--test-img /root/workspace/data/1105.png \
--work-dir /root/workspace/data/finetune_checkpoint_static \
--device cuda \
--log-level INFO \
--show \
--dump-info
I had modified the config files to accomodate the resolution of 1216 x 1920. I understood all the changes required and the pytorch model works flawlessly. However, the TensorRT optimised model is unable to predict some objects which are densely located.
Environment
05/11 09:00:49 - mmengine - INFO -
05/11 09:00:49 - mmengine - INFO - **********Environmental information**********
05/11 09:00:50 - mmengine - INFO - sys.platform: linux
05/11 09:00:50 - mmengine - INFO - Python: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]
05/11 09:00:50 - mmengine - INFO - CUDA available: True
05/11 09:00:50 - mmengine - INFO - numpy_random_seed: 2147483648
05/11 09:00:50 - mmengine - INFO - GPU 0: NVIDIA GeForce GTX 1650
05/11 09:00:50 - mmengine - INFO - CUDA_HOME: /usr/local/cuda
05/11 09:00:50 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.6, V11.6.124
05/11 09:00:50 - mmengine - INFO - GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
05/11 09:00:50 - mmengine - INFO - PyTorch: 1.11.0+cu113
05/11 09:00:50 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.3
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
- CuDNN 8.2
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
05/11 09:00:50 - mmengine - INFO - TorchVision: 0.12.0+cu113
05/11 09:00:50 - mmengine - INFO - OpenCV: 4.7.0
05/11 09:00:50 - mmengine - INFO - MMEngine: 0.7.3
05/11 09:00:50 - mmengine - INFO - MMCV: 2.0.0
05/11 09:00:50 - mmengine - INFO - MMCV Compiler: GCC 9.3
05/11 09:00:50 - mmengine - INFO - MMCV CUDA Compiler: 11.3
05/11 09:00:50 - mmengine - INFO - MMDeploy: 1.0.0+
05/11 09:00:50 - mmengine - INFO -
05/11 09:00:50 - mmengine - INFO - **********Backend information**********
05/11 09:00:50 - mmengine - INFO - tensorrt: 8.2.4.2
05/11 09:00:50 - mmengine - INFO - tensorrt custom ops: Available
05/11 09:00:50 - mmengine - INFO - ONNXRuntime: None
05/11 09:00:50 - mmengine - INFO - ONNXRuntime-gpu: 1.8.1
05/11 09:00:50 - mmengine - INFO - ONNXRuntime custom ops: Available
05/11 09:00:50 - mmengine - INFO - pplnn: None
05/11 09:00:50 - mmengine - INFO - ncnn: None
05/11 09:00:50 - mmengine - INFO - snpe: None
05/11 09:00:50 - mmengine - INFO - openvino: None
05/11 09:00:50 - mmengine - INFO - torchscript: 1.11.0+cu113
05/11 09:00:50 - mmengine - INFO - torchscript custom ops: NotAvailable
05/11 09:00:50 - mmengine - INFO - rknn-toolkit: None
05/11 09:00:50 - mmengine - INFO - rknn-toolkit2: None
05/11 09:00:50 - mmengine - INFO - ascend: None
05/11 09:00:50 - mmengine - INFO - coreml: None
05/11 09:00:50 - mmengine - INFO - tvm: None
05/11 09:00:50 - mmengine - INFO - vacc: None
05/11 09:00:50 - mmengine - INFO -
05/11 09:00:50 - mmengine - INFO - **********Codebase information**********
05/11 09:00:50 - mmengine - INFO - mmdet: 3.0.0
05/11 09:00:50 - mmengine - INFO - mmseg: None
05/11 09:00:50 - mmengine - INFO - mmpretrain: None
05/11 09:00:50 - mmengine - INFO - mmocr: None
05/11 09:00:50 - mmengine - INFO - mmedit: None
05/11 09:00:50 - mmengine - INFO - mmdet3d: None
05/11 09:00:50 - mmengine - INFO - mmpose: None
05/11 09:00:50 - mmengine - INFO - mmrotate: None
05/11 09:00:50 - mmengine - INFO - mmaction: None
05/11 09:00:50 - mmengine - INFO - mmrazor: None
### Error traceback
_No response_
test_pipeline = [
dict(type='LoadImageFromFile', backend_args=None),
dict(type='Resize', scale=(1920, 1216), keep_ratio=True),
dict(
type='PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
Please try to use detection_tensorrt-xxx_dynamic-xx.py and edit the min/opt/max shape according to your need. If you inference by pytorch, the resize strategy in preprocess is keep ratio resize. But if you use static model config, the resize will be replaced by a fix size resize which is not same compared to pytorch preprocess.
Thanks for the response, @irexyc , however, in my case, the image I am giving to model is of a static size and will always remain of the same size (1216x1920), so i don't need to use the resize transform at all. Also, I have tried via the dynamic config, and the results were same.
@himansh1314
There is a bug of static resize (mmdet treat the scale of resize as (w, h), while mmdeploy sdk treat it as (h, w)), this pr will fix it https://github.com/open-mmlab/mmdeploy/pull/2063.
You can edit the pipeline.json, swap the two parameter of resize to see if it can help.
@irexyc Yes, I noticed the bug. And I swapped the parameters and it worked fine. Infact, I removed the 'Resize' transform completely and it worked fine as well because by default, my image is of the size (1216x1920), and all the images in my pipeline are of static size. However, still the same issue as I mentioned above.
@irexyc I think there is some issue with post processing in TensorRT. I'm saying this as the objects detected by torch model and tensorrt have the exact same confidence score. So, it's pretty clear that the preprocessing, inferencing and model optimisation is same, but something is different in post processing(probably nms), which is discarding some detections.
@himansh1314 We made nms as part of model inference instead of postprocess. However the settings for nms are fixed (won't read your model config) when you convert the model. You can edit some parameters in this file configs/mmdet/_base_/base_static.py
before convert the model
@irexyc So I understood what is possibly the issue. In the configs/mmdet/_base_/base_static.py
file, the parameter pre_top_k
is set to 5000. I'm not sure what this parameter means, but my best guess is that indicates the top 5000 predictions based on confidence score, which are then sent to nms for post processing. However, in my config.py file, I had set this to 30000, and hence, the pytorch model was able to detect more objects. However, when I changed this 30000 in configs/mmdet/_base_/base_static.py
file, the model was able to convert successfully, however, when the tools/deloy.py
was testing the model for visualisation, it crashed and returned the error
05/11 12:42:35 - mmengine - INFO - Successfully loaded tensorrt plugins from /root/workspace/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so
05/11 12:42:35 - mmengine - INFO - Successfully loaded tensorrt plugins from /root/workspace/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so
#assertion/root/workspace/mmdeploy/csrc/mmdeploy/backend_ops/tensorrt/common_impl/nms/allClassNMS.cu,210
05/11 12:42:52 - mmengine - ERROR - /root/workspace/mmdeploy/tools/deploy.py - create_process - 82 - visualize tensorrt model failed.
Also, when I try to inference using the mmdeploy_runtime sdk, the operation gets aborted during inferencing. I don't understand, how come pytorch nms is able to handle 30000 predictions whereas TensorRT fails? I also tried changing the value to 15000, 10000, and 7500, but nothing worked. I think this is an important issue and would really appreciate if you could help me with this. @irexyc
@himansh1314
It seems that you have already modified some content of base_static.py
. However, you doesn't modify the score_threshold
to 0.001. Not sure if it could help, you can have a try.
The config file base_static.py
_base_ = ['../../_base_/onnx_config.py'] onnx_config = dict(output_names=['dets', 'labels'], input_shape=None) codebase_config = dict( type='mmdet', task='ObjectDetection', model_type='end2end', post_processing=dict( score_threshold=0.05, confidence_threshold=0.005, # for YOLOv3 iou_threshold=0.6, max_output_boxes_per_class=3000, pre_top_k=5000, keep_top_k=3000, background_label_id=-1, ))
There are some assert in allClassNMS.cu
, @grimoire could have a look at this.
This is the error message I get, when I convert the model using deploy/tools.py after changing a few configurations in base_static.py file.
build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
05/12 05:05:21 - mmengine - INFO - Successfully loaded tensorrt plugins from /root/workspace/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so
05/12 05:05:21 - mmengine - INFO - Successfully loaded tensorrt plugins from /root/workspace/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so
#assertion/root/workspace/mmdeploy/csrc/mmdeploy/backend_ops/tensorrt/common_impl/nms/allClassNMS.cu,210
05/12 05:05:38 - mmengine - ERROR - /root/workspace/mmdeploy/tools/deploy.py - create_process - 82 - visualize tensorrt model failed.
The code indeed asserts something.
Here's the base_static.py
file
_base_ = ['../../_base_/onnx_config.py']
onnx_config = dict(output_names=['dets', 'labels'], input_shape=None)
codebase_config = dict(
type='mmdet',
task='ObjectDetection',
model_type='end2end',
post_processing=dict(
score_threshold=0.001,
confidence_threshold=0.005, # for YOLOv3
iou_threshold=0.6,
max_output_boxes_per_class=3000,
pre_top_k=10000,
keep_top_k=3000,
background_label_id=-1,
))
Note that, in the pytorch code, the predictions before nms is set to 30000, and when in base_static.py, the pre_top_k, is set to 5000. My model is supposed to predict over 1000 objects, which can be densely populated, and hence I tried changing it to larger values like 30000, 10000 etc. I just want my tensorrt model to give correct predictions like pytorch model and not miss any predictions. Please look into this. @irexyc @grimoire
Please keep pre_top_k = 5000, there are asserts in allClassNMS.cu
const static int BS = 512;
...
const int t_size = (top_k + BS - 1) / BS;
ASSERT(t_size <= 10);
@irexyc @grimoire I understand that there are some asserts. Is there any other way around this? I think this cap on pre_top_k is causing the huge difference between the performance on tensorrt and pytorch model? Is there any other way where I can do inferencing using TensorRT and the NMS and post process can be done on PyTorch? I don't mind if the latency shoots up a little bit.
@irexyc if I comment out the ASSERT part in the repository and build the container again entirely, would it work? or there are some other checks and dependencies as well in some other part of the code?
I think it will work for nms after comment the assert.
I'm not quite sure why there has assert compared to https://github.com/NVIDIA/TensorRT/blob/master/plugin/common/kernels/allClassNMS.cu @grimoire may explain to you
With score_threshold=0.001, the results are still very different compared to pytorch right?
t_size
is the cache size of each cuda thread in NMS kernel.
https://github.com/NVIDIA/TensorRT/blob/96e23978cd6e4a8fe869696d3d8ec2b47120629b/plugin/common/kernels/allClassNMS.cu#L196
Large cache size will lead to low occupancy(large amount of registers are required). https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm
If you insist ... Add p(X)
in
https://github.com/open-mmlab/mmdeploy/blob/26b66ef5112ce47b8f4562eef49aae9614b8c633/csrc/mmdeploy/backend_ops/tensorrt/common_impl/nms/allClassNMS.cu#L202
and comment the assert.
@irexyc, Yes even with score threshold of 0.001, the results didn't change. @grimoire , Hi, I appreciate you helping me, thanks. Can you please tell me what is X in P(X) that you mentioned, and is possible can you please mention the exact change? Sorry, I'm not much familiar with CUDA programming. I would appreciate if you could help me with exactly the code changes, specifically, if I want to set the top_pre_k parameter to, let's say, 30000.
My guess is that X in P(X) is the number of registers, or probably threads you mean? In that case, should I modify it like
#define P(tsize) allClassNMS_kernel<T_SCORE, T_BBOX, (tsize)>
void (*kernel[30])(const int, const int, const int, const int, const float, const bool,
const bool, float *, T_SCORE *, int *, T_SCORE *, int *, bool) = {
P(1), P(2), P(3), P(4), P(5), P(6), P(7), P(8), P(9), P(10),
P(11), P(12), P(13), P(14), P(15), P(16), P(17), P(18), P(19), P(20),
P(21), P(22), P(23), P(24), P(25), P(26), P(27), P(28), P(29), P(30)
};
//ASSERT(t_size <= 10);
Also, just to confirm, if I change the code, I'll have to build the entire thing again right? And from what part am I supposed to build specifically? I'm optimising and running the SDK inside the docker container that you provided, so I guess I have to make changes once the repo is cloned and so on.
X
is the t_size
you want.
// BS is 512
const int t_size = (top_k + BS - 1) / BS;
So 30000 requires t_size = 60
I guess?
As the dockerfile indicates https://github.com/open-mmlab/mmdeploy/blob/26b66ef5112ce47b8f4562eef49aae9614b8c633/docker/GPU/Dockerfile#L68
MMDeploy should have been placed in the container somewhere. So make again in the build path after update the code should be enough.
Yes, correct, the t_size should be 60, my bad, I should have written 'so on..' after P(30). Anyways I will make the changes and build again, and see if it works. Thanks for helping. Will update you how it goes.
I made the changes and started the build process again, however, it ends up with this error after make -j$(nproc)
[ 80%] Linking CUDA device code CMakeFiles/mmdeploy_tensorrt_ops.dir/cmake_device_link.o
[ 80%] Building CXX object csrc/mmdeploy/net/trt/CMakeFiles/mmdeploy_trt_net.dir/trt_net.cpp.o
[ 80%] Linking CXX shared module ../../../../lib/libmmdeploy_tensorrt_ops.so
/usr/bin/ld: CMakeFiles/mmdeploy_tensorrt_ops_obj.dir/common_impl/nms/allClassNMS.cu.o: in function `allClassNMS(CUstream_st*, int, int, int, int, float, bool, bool, nvinfer1::DataType, nvinfer1::DataType, void*, void*, void*, void*, void*, bool)':
tmpxft_00005952_00000000-6_allClassNMS.compute_87.cudafe1.cpp:(.text+0x10): multiple definition of `allClassNMS(CUstream_st*, int, int, int, int, float, bool, bool, nvinfer1::DataType, nvinfer1::DataType, void*, void*, void*, void*, void*, bool)'; CMakeFiles/mmdeploy_tensorrt_ops_obj.dir/common_impl/nms/.ipynb_checkpoints/allClassNMS-checkpoint.cu.o:tmpxft_00005950_00000000-6_allClassNMS-checkpoint.compute_87.cudafe1.cpp:(.text+0x10): first defined here
/usr/bin/ld: CMakeFiles/mmdeploy_tensorrt_ops_obj.dir/common_impl/nms/allClassNMS.cu.o: in function `nmsInit()':
tmpxft_00005952_00000000-6_allClassNMS.compute_87.cudafe1.cpp:(.text+0x120): multiple definition of `nmsInit()'; CMakeFiles/mmdeploy_tensorrt_ops_obj.dir/common_impl/nms/.ipynb_checkpoints/allClassNMS-checkpoint.cu.o:tmpxft_00005950_00000000-6_allClassNMS-checkpoint.compute_87.cudafe1.cpp:(.text+0x120): first defined here
collect2: error: ld returned 1 exit status
make[2]: *** [csrc/mmdeploy/backend_ops/tensorrt/CMakeFiles/mmdeploy_tensorrt_ops.dir/build.make:240: lib/libmmdeploy_tensorrt_ops.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:262: csrc/mmdeploy/backend_ops/tensorrt/CMakeFiles/mmdeploy_tensorrt_ops.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 80%] Linking CXX static library ../../../../lib/libmmdeploy_trt_net.a
[ 80%] Built target mmdeploy_trt_net
make: *** [Makefile:130: all] Error 2
Can you try to clean the old build folder and build again?
@irexyc @grimoire Didn't help, started the entire docker process again, This time got another error.
/usr/bin/ld: CMakeFiles/mmdeploy_tensorrt_ops_obj.dir/common_impl/nms/allClassNMS.cu.o: in function `allClassNMS(CUstream_st*, int, int, int, int, float, bool, bool, nvinfer1::DataType, nvinfer1::DataType, void*, void*, void*, void*, void*, bool)':
tmpxft_00000337_00000000-6_allClassNMS.compute_87.cudafe1.cpp:(.text+0x10): multiple definition of `allClassNMS(CUstream_st*, int, int, int, int, float, bool, bool, nvinfer1::DataType, nvinfer1::DataType, void*, void*, void*, void*, void*, bool)'; CMakeFiles/mmdeploy_tensorrt_ops_obj.dir/common_impl/nms/.ipynb_checkpoints/allClassNMS-checkpoint.cu.o:tmpxft_00000338_00000000-6_allClassNMS-checkpoint.compute_87.cudafe1.cpp:(.text+0x10): first defined here
/usr/bin/ld: CMakeFiles/mmdeploy_tensorrt_ops_obj.dir/common_impl/nms/allClassNMS.cu.o: in function `nmsInit()':
tmpxft_00000337_00000000-6_allClassNMS.compute_87.cudafe1.cpp:(.text+0x120): multiple definition of `nmsInit()'; CMakeFiles/mmdeploy_tensorrt_ops_obj.dir/common_impl/nms/.ipynb_checkpoints/allClassNMS-checkpoint.cu.o:tmpxft_00000338_00000000-6_allClassNMS-checkpoint.compute_87.cudafe1.cpp:(.text+0x120): first defined here
collect2: error: ld returned 1 exit status
make[2]: *** [csrc/mmdeploy/backend_ops/tensorrt/CMakeFiles/mmdeploy_tensorrt_ops.dir/build.make:240: lib/libmmdeploy_tensorrt_ops.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:226: csrc/mmdeploy/backend_ops/tensorrt/CMakeFiles/mmdeploy_tensorrt_ops.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
@irexyc @grimoire Can you please create another branch temporarily where you fix this issue? It would be very helpful for not only me but for all the developers that are building their own detectors on custom datasets with different requirements?
Can you print the output of git diff
under mmdeploy root folder. I want to know the modification you did
There you go. Please have a look at it, and let me know of any changes.
The changes are same with mine.
I made the following steps and didn't meet any error.
docker run -it --rm --gpus all ubuntu20.04-cuda11.3-mmdeploy1.0.0
cd /root/workspace/mmdeploy/build
vim ../csrc/mmdeploy/backend_ops/tensorrt/common_impl/nms/allClassNMS.cu # edit the code
make -j8 && make install
@irexyc So you didn't go through the cmake process again like mentioned in the dockerfile?
RUN git clone -b main https://github.com/open-mmlab/mmdeploy &&\
cd mmdeploy &&\
if [ -z ${VERSION} ] ; then echo "No MMDeploy version passed in, building on main" ; else git checkout tags/v${VERSION} -b tag_v${VERSION} ; fi &&\
git submodule update --init --recursive &&\
mkdir -p build &&\
cd build &&\
cmake -DMMDEPLOY_TARGET_BACKENDS="ort;trt" .. &&\
make -j$(nproc) &&\
cd .. &&\
/opt/conda/bin/mim install -e .
I can't see the cmake instruction in your code that you step that you shared just now.
@himansh1314 No, I didn't go through the cmake process because I didn't meet compilier error.
You met the compilier error, so I suggest you to delete the build folder and re-configure the project.
cmake -DMMDEPLOY_TARGET_BACKENDS="ort;trt" ..
only build custom ops.
Since you use sdk, you could refer these lines to configure and build mmdeploy https://github.com/open-mmlab/mmdeploy/blob/main/docker/GPU/Dockerfile#L89C1-L102
@irexyc @grimoire
I was able to make changes to the allClassNMS.cu file and compile it successfully. However, this time after converting, I got another assertion error from file csrc/mmdeploy/backend_ops/tensorrt/batched_nms/trt_batched_nms.cpp
at line 103.
Also, I got the error too many resources requested
from allClassNMS.cu file at line 703
Is there a way I can run NMS separately in pytorch? NMS with top_pred_k of 30000 seems to be working fine on pytorch. I don't mind if inferencing time increases a little bit.
@himansh1314 @irexyc @grimoire I had the same issue in mmdeploy 0.13. You are right this is wrong preprocessing conversion. This issue is with keep_ratio not converted correctly.
This is a copy from your config:
config.py pipeline=[ dict(type='LoadImageFromFile', backend_args=None), dict(type='Resize', scale=(1920, 1216), keep_ratio=True), dict( type='Pad', size=(1920, 1216), pad_val=dict(img=(114, 114, 114))), dict( type='PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')) ],
pipeline.json "transforms": [ { "type": "LoadImageFromFile", "backend_args": null }, { "type": "Resize", "keep_ratio": false, "size": [ 1920, 1216 ] }, { "type": "Normalize", "to_rgb": true, "mean": [ 179.92, 149.48, 198.26 ], "std": [ 14.06, 11.88, 11.06 ] }, { "type": "Pad", "size_divisor": 1 }, .....
@RunningLeon @irexyc @grimoire Can you confirm there is a bug in 0.13. Please see my previous comment
@himansh1314 @irexyc @grimoire I had the same issue in mmdeploy 0.13. You are right this is wrong preprocessing conversion. This issue is with keep_ratio not converted correctly.
This is a copy from your config:
config.py pipeline=[ dict(type='LoadImageFromFile', backend_args=None), dict(type='Resize', scale=(1920, 1216), keep_ratio=True), dict( type='Pad', size=(1920, 1216), pad_val=dict(img=(114, 114, 114))), dict( type='PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')) ],
pipeline.json "transforms": [ { "type": "LoadImageFromFile", "backend_args": null }, { "type": "Resize", "keep_ratio": false, "size": [ 1920, 1216 ] }, { "type": "Normalize", "to_rgb": true, "mean": [ 179.92, 149.48, 198.26 ], "std": [ 14.06, 11.88, 11.06 ] }, { "type": "Pad", "size_divisor": 1 }, .....
@shimen hi,
This model config(has PackDetInputs
) is from mmdet3.0 which should use mmdeploy>=1.0.0