mmdetection icon indicating copy to clipboard operation
mmdetection copied to clipboard

Implement the RTMDET target detection algorithm, and test its FPS. It is found that only 7-8, not the official 300+

Open lai-serena opened this issue 1 year ago • 9 comments

Prerequisite

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmdetection

Environment

sys.platform: linux Python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0: Tesla T4 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.1, V11.1.105 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.9.0 PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  • CuDNN 8.0.5
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.10.0 OpenCV: 4.7.0 MMEngine: 0.6.0 MMDetection: 3.0.0rc6+61dd8d5

Reproduces the problem - code sample

config_file =r"./work_dirs/rtmdet_s_8xb32-300e_seal/rtmdet_s_8xb32-300e_seal.py" checkpoint_file =r"./work_dirs/rtmdet_s_8xb32-300e_seal/epoch_300.pth" model = init_detector(config_file, checkpoint_file, device='cuda:0') img = cv2.imread( 'demo.jpg') result = inference_detector(model, img) print(result)

Reproduces the problem - command or script

Implement the RTMDET target detection algorithm, and use time. time() to test its FPS. It is found that only 7-8, not the official 300+. It is unclear whether there is something wrong that is causing the detection speed to be slow.

Reproduces the problem - error message

no error

Additional information

No response

lai-serena avatar Mar 28 '23 07:03 lai-serena

@lai-serena To test the FPS of the model, you should use the https://github.com/open-mmlab/mmdetection/blob/3.x/tools/analysis_tools/benchmark.py script. Your testing method is not correct.

hhaAndroid avatar Mar 29 '23 06:03 hhaAndroid

@lai-serena To test the FPS of the model, you should use the https://github.com/open-mmlab/mmdetection/blob/3.x/tools/analysis_tools/benchmark.py script. Your testing method is not correct.

PUHV`C8}H1BM 5OBKI5RX9A thanks for answer my question!This is the result of testing FPS on my machine. 1、I don't quite understand: fps=33.5 batch/s and batch_size=5 Does this mean FPS= 33.5*5=167.5 images/s? 2、I want to deploy on the machine, but with this file(demo/inference_demo.ipynb), I need about 0.4s to get the results of the model.Why is it different from the test the FPS with tools/analysis_tools/benchmark.py?

lai-serena avatar Mar 29 '23 09:03 lai-serena

@lai-serena To test the FPS of the model, you should use the https://github.com/open-mmlab/mmdetection/blob/3.x/tools/analysis_tools/benchmark.py script. Your testing method is not correct.

PUHV`C8}H1BM 5OBKI5RX9A thanks for answer my question!This is the result of testing FPS on my machine. 1、I don't quite understand: fps=33.5 batch/s and batch_size=5 Does this mean FPS= 33.5*5=167.5 images/s? 2、I want to deploy on the machine, but with this file(demo/inference_demo.ipynb), I need about 0.4s to get the results of the model.Why is it different from the test the FPS with tools/analysis_tools/benchmark.py?

Hi, i am also using the tools/analysis_tools/benchmark.py, but i saw the parser.add_argument( '--task', choices=['inference', 'dataloader', 'dataset'], default='dataloader', help='Which task do you want to go to benchmark') in the code so i wandering did you use the default "--task dataloader" or "--task inference"?

GTrui6 avatar Apr 02 '23 13:04 GTrui6

I agreed with @GTrui6 that many people probably use the default "--task dataloader" instead of "--task inference", so the resulting FPS is unrealistically high. I tried to reproduce the FPS numbers for RTMDet and other YOLO models with benchmark.py. All the YOLO models' performance could be reproduced, while RTMDet was significantly slower than officially reported .

Even though I don't hope so, but I am afraid RTMDet's official benchmark is wrong. That's very critical. Can someone look into it? @hhaAndroid

zenjieli avatar May 09 '23 11:05 zenjieli

I agreed with @GTrui6 that many people probably use the default "--task dataloader" instead of "--task inference", so the resulting FPS is unrealistically high. I tried to reproduce the FPS numbers for RTMDet and other YOLO models with benchmark.py. All the YOLO models' performance could be reproduced, while RTMDet was significantly slower than officially reported .

Even though I don't hope so, but I am afraid RTMDet's official benchmark is wrong. That's very critical. Can someone look into it? @hhaAndroid

is there a bug in inference mode?i cannot run the inference mode。i wonder does the dataset mode contians the inference mode?

TheGreatTreatsby avatar May 17 '23 09:05 TheGreatTreatsby

I agreed with @GTrui6 that many people probably use the default "--task dataloader" instead of "--task inference", so the resulting FPS is unrealistically high. I tried to reproduce the FPS numbers for RTMDet and other YOLO models with benchmark.py. All the YOLO models' performance could be reproduced, while RTMDet was significantly slower than officially reported .

Even though I don't hope so, but I am afraid RTMDet's official benchmark is wrong. That's very critical. Can someone look into it? @hhaAndroid

Hi what are the fps rates for rtmdet models can you please specify, and also can you please give the command for running the benchmark.py

Vibhuvan avatar Jun 29 '23 19:06 Vibhuvan

I agreed with @GTrui6 that many people probably use the default "--task dataloader" instead of "--task inference", so the resulting FPS is unrealistically high. I tried to reproduce the FPS numbers for RTMDet and other YOLO models with benchmark.py. All the YOLO models' performance could be reproduced, while RTMDet was significantly slower than officially reported . Even though I don't hope so, but I am afraid RTMDet's official benchmark is wrong. That's very critical. Can someone look into it? @hhaAndroid

Hi what are the fps rates for rtmdet models can you please specify, and also can you please give the command for running the benchmark.py

Hi, i have the same issue here https://github.com/open-mmlab/mmdetection/issues/11599, can you please help me? @Vibhuvan

tmax-cn avatar Apr 03 '24 02:04 tmax-cn

I am using the benchmark script in inference mode, and I am also unable to reproduce the inference speed results for RTMDet models. I am using a fixed 640x640 COCO detection dataset input size to generate a simple plot (note that the mAP values were directly taken from the MMDet model zoo tables).

image

Something is clearly going on here since RTMDet models are the slowest by a large margin

This is the config I have used to test RTMDet Large:

model = dict(
    type='RTMDet',
    data_preprocessor=dict(
        type='DetDataPreprocessor',
        mean=[103.53, 116.28, 123.675],
        std=[57.375, 57.12, 58.395],
        bgr_to_rgb=False,
        batch_augments=None),
    backbone=dict(
        type='CSPNeXt',
        arch='P5',
        expand_ratio=0.5,
        deepen_factor=1,
        widen_factor=1,
        channel_attention=True,
        norm_cfg=dict(type='SyncBN'),
        act_cfg=dict(type='SiLU', inplace=True)),
    neck=dict(
        type='CSPNeXtPAFPN',
        in_channels=[256, 512, 1024],
        out_channels=256,
        num_csp_blocks=3,
        expand_ratio=0.5,
        norm_cfg=dict(type='SyncBN'),
        act_cfg=dict(type='SiLU', inplace=True)),
    bbox_head=dict(
        type='RTMDetSepBNHead',
        num_classes=80,
        in_channels=256,
        stacked_convs=2,
        feat_channels=256,
        anchor_generator=dict(
            type='MlvlPointGenerator', offset=0, strides=[8, 16, 32]),
        bbox_coder=dict(type='DistancePointBBoxCoder'),
        loss_cls=dict(
            type='QualityFocalLoss',
            use_sigmoid=True,
            beta=2.0,
            loss_weight=1.0),
        loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
        with_objectness=False,
        exp_on_reg=True,
        share_conv=True,
        pred_kernel_size=1,
        norm_cfg=dict(type='SyncBN'),
        act_cfg=dict(type='SiLU', inplace=True)),
    train_cfg=dict(
        assigner=dict(type='DynamicSoftLabelAssigner', topk=13),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        nms_pre=30000,
        min_bbox_size=0,
        score_thr=0.001,
        nms=dict(type='nms', iou_threshold=0.65),
        max_per_img=300),
)

env_cfg = dict(
    mp_cfg=dict(opencv_num_threads=0, mp_start_method='fork'),
    dist_cfg=dict(backend='nccl'))

dataset_type = 'CocoDataset'
data_root = 'coco/'
image_size = (640, 640)
backend_args = None

test_pipeline = [
    dict(type='LoadImageFromFile', backend_args=backend_args),
    dict(type='Resize', scale=image_size, keep_ratio=False),
    dict(
        type='PackDetInputs',
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                   'scale_factor'))
]

test_dataloader = dict(
    batch_size=1,
    num_workers=2,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file='annotations/instances_val2017.json',
        data_prefix=dict(img='images/val2017/'),
        test_mode=True,
        pipeline=test_pipeline,
        backend_args=backend_args))

mmeendez8 avatar May 03 '24 11:05 mmeendez8

I have opened a new issue with some code that reproduces these findings: https://github.com/open-mmlab/mmdetection/issues/11682

mmeendez8 avatar May 06 '24 11:05 mmeendez8