mmdetection Implement the RTMDET target detection algorithm, and test its FPS. It is found that only 7-8, not the official 300+

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version (master) or latest version (3.x).

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmdetection

Environment

sys.platform: linux Python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0: Tesla T4 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.1, V11.1.105 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.9.0 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.10.0 OpenCV: 4.7.0 MMEngine: 0.6.0 MMDetection: 3.0.0rc6+61dd8d5

Reproduces the problem - code sample

config_file =r"./work_dirs/rtmdet_s_8xb32-300e_seal/rtmdet_s_8xb32-300e_seal.py" checkpoint_file =r"./work_dirs/rtmdet_s_8xb32-300e_seal/epoch_300.pth" model = init_detector(config_file, checkpoint_file, device='cuda:0') img = cv2.imread( 'demo.jpg') result = inference_detector(model, img) print(result)

Reproduces the problem - command or script

Implement the RTMDET target detection algorithm, and use time. time() to test its FPS. It is found that only 7-8, not the official 300+. It is unclear whether there is something wrong that is causing the detection speed to be slow.

Reproduces the problem - error message

no error

Additional information

No response

Mar 28 '23 07:03 lai-serena

@lai-serena To test the FPS of the model, you should use the https://github.com/open-mmlab/mmdetection/blob/3.x/tools/analysis_tools/benchmark.py script. Your testing method is not correct.

Mar 29 '23 06:03 hhaAndroid

@lai-serena To test the FPS of the model, you should use the https://github.com/open-mmlab/mmdetection/blob/3.x/tools/analysis_tools/benchmark.py script. Your testing method is not correct.

PUHV`C8}H1BM 5OBKI5RX9A thanks for answer my question!This is the result of testing FPS on my machine. 1、I don't quite understand: fps=33.5 batch/s and batch_size=5 Does this mean FPS= 33.5*5=167.5 images/s? 2、I want to deploy on the machine, but with this file（demo/inference_demo.ipynb）, I need about 0.4s to get the results of the model.Why is it different from the test the FPS with tools/analysis_tools/benchmark.py?

Mar 29 '23 09:03 lai-serena

@lai-serena To test the FPS of the model, you should use the https://github.com/open-mmlab/mmdetection/blob/3.x/tools/analysis_tools/benchmark.py script. Your testing method is not correct.

thanks for answer my question!This is the result of testing FPS on my machine. 1、I don't quite understand: fps=33.5 batch/s and batch_size=5 Does this mean FPS= 33.5*5=167.5 images/s? 2、I want to deploy on the machine, but with this file（demo/inference_demo.ipynb）, I need about 0.4s to get the results of the model.Why is it different from the test the FPS with tools/analysis_tools/benchmark.py?

Hi, i am also using the tools/analysis_tools/benchmark.py, but i saw the parser.add_argument( '--task', choices=['inference', 'dataloader', 'dataset'], default='dataloader', help='Which task do you want to go to benchmark') in the code so i wandering did you use the default "--task dataloader" or "--task inference"?

Apr 02 '23 13:04 GTrui6

I agreed with @GTrui6 that many people probably use the default "--task dataloader" instead of "--task inference", so the resulting FPS is unrealistically high. I tried to reproduce the FPS numbers for RTMDet and other YOLO models with benchmark.py. All the YOLO models' performance could be reproduced, while RTMDet was significantly slower than officially reported .

Even though I don't hope so, but I am afraid RTMDet's official benchmark is wrong. That's very critical. Can someone look into it? @hhaAndroid

May 09 '23 11:05 zenjieli

I agreed with @GTrui6 that many people probably use the default "--task dataloader" instead of "--task inference", so the resulting FPS is unrealistically high. I tried to reproduce the FPS numbers for RTMDet and other YOLO models with benchmark.py. All the YOLO models' performance could be reproduced, while RTMDet was significantly slower than officially reported .

Even though I don't hope so, but I am afraid RTMDet's official benchmark is wrong. That's very critical. Can someone look into it? @hhaAndroid

is there a bug in inference mode？i cannot run the inference mode。i wonder does the dataset mode contians the inference mode？

May 17 '23 09:05 TheGreatTreatsby

I agreed with @GTrui6 that many people probably use the default "--task dataloader" instead of "--task inference", so the resulting FPS is unrealistically high. I tried to reproduce the FPS numbers for RTMDet and other YOLO models with benchmark.py. All the YOLO models' performance could be reproduced, while RTMDet was significantly slower than officially reported .

Even though I don't hope so, but I am afraid RTMDet's official benchmark is wrong. That's very critical. Can someone look into it? @hhaAndroid

Hi what are the fps rates for rtmdet models can you please specify, and also can you please give the command for running the benchmark.py

Jun 29 '23 19:06 Vibhuvan

I agreed with @GTrui6 that many people probably use the default "--task dataloader" instead of "--task inference", so the resulting FPS is unrealistically high. I tried to reproduce the FPS numbers for RTMDet and other YOLO models with benchmark.py. All the YOLO models' performance could be reproduced, while RTMDet was significantly slower than officially reported . Even though I don't hope so, but I am afraid RTMDet's official benchmark is wrong. That's very critical. Can someone look into it? @hhaAndroid

Hi what are the fps rates for rtmdet models can you please specify, and also can you please give the command for running the benchmark.py

Hi, i have the same issue here https://github.com/open-mmlab/mmdetection/issues/11599, can you please help me? @Vibhuvan

Apr 03 '24 02:04 tmax-cn

I am using the benchmark script in inference mode, and I am also unable to reproduce the inference speed results for RTMDet models. I am using a fixed 640x640 COCO detection dataset input size to generate a simple plot (note that the mAP values were directly taken from the MMDet model zoo tables).

Something is clearly going on here since RTMDet models are the slowest by a large margin

This is the config I have used to test RTMDet Large:

model = dict(
    type='RTMDet',
    data_preprocessor=dict(
        type='DetDataPreprocessor',
        mean=[103.53, 116.28, 123.675],
        std=[57.375, 57.12, 58.395],
        bgr_to_rgb=False,
        batch_augments=None),
    backbone=dict(
        type='CSPNeXt',
        arch='P5',
        expand_ratio=0.5,
        deepen_factor=1,
        widen_factor=1,
        channel_attention=True,
        norm_cfg=dict(type='SyncBN'),
        act_cfg=dict(type='SiLU', inplace=True)),
    neck=dict(
        type='CSPNeXtPAFPN',
        in_channels=[256, 512, 1024],
        out_channels=256,
        num_csp_blocks=3,
        expand_ratio=0.5,
        norm_cfg=dict(type='SyncBN'),
        act_cfg=dict(type='SiLU', inplace=True)),
    bbox_head=dict(
        type='RTMDetSepBNHead',
        num_classes=80,
        in_channels=256,
        stacked_convs=2,
        feat_channels=256,
        anchor_generator=dict(
            type='MlvlPointGenerator', offset=0, strides=[8, 16, 32]),
        bbox_coder=dict(type='DistancePointBBoxCoder'),
        loss_cls=dict(
            type='QualityFocalLoss',
            use_sigmoid=True,
            beta=2.0,
            loss_weight=1.0),
        loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
        with_objectness=False,
        exp_on_reg=True,
        share_conv=True,
        pred_kernel_size=1,
        norm_cfg=dict(type='SyncBN'),
        act_cfg=dict(type='SiLU', inplace=True)),
    train_cfg=dict(
        assigner=dict(type='DynamicSoftLabelAssigner', topk=13),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        nms_pre=30000,
        min_bbox_size=0,
        score_thr=0.001,
        nms=dict(type='nms', iou_threshold=0.65),
        max_per_img=300),
)

env_cfg = dict(
    mp_cfg=dict(opencv_num_threads=0, mp_start_method='fork'),
    dist_cfg=dict(backend='nccl'))

dataset_type = 'CocoDataset'
data_root = 'coco/'
image_size = (640, 640)
backend_args = None

test_pipeline = [
    dict(type='LoadImageFromFile', backend_args=backend_args),
    dict(type='Resize', scale=image_size, keep_ratio=False),
    dict(
        type='PackDetInputs',
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                   'scale_factor'))
]

test_dataloader = dict(
    batch_size=1,
    num_workers=2,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file='annotations/instances_val2017.json',
        data_prefix=dict(img='images/val2017/'),
        test_mode=True,
        pipeline=test_pipeline,
        backend_args=backend_args))

May 03 '24 11:05 mmeendez8

I have opened a new issue with some code that reproduces these findings: https://github.com/open-mmlab/mmdetection/issues/11682

May 06 '24 11:05 mmeendez8

mmdetection mmdetection copied to clipboard

Implement the RTMDET target detection algorithm, and test its FPS. It is found that only 7-8, not the official 300+

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

mmdetection
mmdetection copied to clipboard