mmdetection
mmdetection copied to clipboard
Implement the RTMDET target detection algorithm, and test its FPS. It is found that only 7-8, not the official 300+
Prerequisite
- [X] I have searched Issues and Discussions but cannot get the expected help.
- [X] I have read the FAQ documentation but cannot get the expected help.
- [X] The bug has not been fixed in the latest version (master) or latest version (3.x).
Task
I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.
Branch
master branch https://github.com/open-mmlab/mmdetection
Environment
sys.platform: linux Python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0: Tesla T4 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.1, V11.1.105 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.9.0 PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.0.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.10.0 OpenCV: 4.7.0 MMEngine: 0.6.0 MMDetection: 3.0.0rc6+61dd8d5
Reproduces the problem - code sample
config_file =r"./work_dirs/rtmdet_s_8xb32-300e_seal/rtmdet_s_8xb32-300e_seal.py" checkpoint_file =r"./work_dirs/rtmdet_s_8xb32-300e_seal/epoch_300.pth" model = init_detector(config_file, checkpoint_file, device='cuda:0') img = cv2.imread( 'demo.jpg') result = inference_detector(model, img) print(result)
Reproduces the problem - command or script
Implement the RTMDET target detection algorithm, and use time. time() to test its FPS. It is found that only 7-8, not the official 300+. It is unclear whether there is something wrong that is causing the detection speed to be slow.
Reproduces the problem - error message
no error
Additional information
No response
@lai-serena To test the FPS of the model, you should use the https://github.com/open-mmlab/mmdetection/blob/3.x/tools/analysis_tools/benchmark.py script. Your testing method is not correct.
@lai-serena To test the FPS of the model, you should use the https://github.com/open-mmlab/mmdetection/blob/3.x/tools/analysis_tools/benchmark.py script. Your testing method is not correct.
thanks for answer my question!This is the result of testing FPS on my machine.
1、I don't quite understand: fps=33.5 batch/s and batch_size=5 Does this mean FPS= 33.5*5=167.5 images/s?
2、I want to deploy on the machine, but with this file(demo/inference_demo.ipynb), I need about 0.4s to get the results of the model.Why is it different from the test the FPS with tools/analysis_tools/benchmark.py?
@lai-serena To test the FPS of the model, you should use the https://github.com/open-mmlab/mmdetection/blob/3.x/tools/analysis_tools/benchmark.py script. Your testing method is not correct.
thanks for answer my question!This is the result of testing FPS on my machine. 1、I don't quite understand: fps=33.5 batch/s and batch_size=5 Does this mean FPS= 33.5*5=167.5 images/s? 2、I want to deploy on the machine, but with this file(demo/inference_demo.ipynb), I need about 0.4s to get the results of the model.Why is it different from the test the FPS with tools/analysis_tools/benchmark.py?
Hi, i am also using the tools/analysis_tools/benchmark.py, but i saw the parser.add_argument( '--task', choices=['inference', 'dataloader', 'dataset'], default='dataloader', help='Which task do you want to go to benchmark') in the code so i wandering did you use the default "--task dataloader" or "--task inference"?
I agreed with @GTrui6 that many people probably use the default "--task dataloader" instead of "--task inference", so the resulting FPS is unrealistically high. I tried to reproduce the FPS numbers for RTMDet and other YOLO models with benchmark.py. All the YOLO models' performance could be reproduced, while RTMDet was significantly slower than officially reported .
Even though I don't hope so, but I am afraid RTMDet's official benchmark is wrong. That's very critical. Can someone look into it? @hhaAndroid
I agreed with @GTrui6 that many people probably use the default "--task dataloader" instead of "--task inference", so the resulting FPS is unrealistically high. I tried to reproduce the FPS numbers for RTMDet and other YOLO models with benchmark.py. All the YOLO models' performance could be reproduced, while RTMDet was significantly slower than officially reported .
Even though I don't hope so, but I am afraid RTMDet's official benchmark is wrong. That's very critical. Can someone look into it? @hhaAndroid
is there a bug in inference mode?i cannot run the inference mode。i wonder does the dataset mode contians the inference mode?
I agreed with @GTrui6 that many people probably use the default "--task dataloader" instead of "--task inference", so the resulting FPS is unrealistically high. I tried to reproduce the FPS numbers for RTMDet and other YOLO models with benchmark.py. All the YOLO models' performance could be reproduced, while RTMDet was significantly slower than officially reported .
Even though I don't hope so, but I am afraid RTMDet's official benchmark is wrong. That's very critical. Can someone look into it? @hhaAndroid
Hi what are the fps rates for rtmdet models can you please specify, and also can you please give the command for running the benchmark.py
I agreed with @GTrui6 that many people probably use the default "--task dataloader" instead of "--task inference", so the resulting FPS is unrealistically high. I tried to reproduce the FPS numbers for RTMDet and other YOLO models with benchmark.py. All the YOLO models' performance could be reproduced, while RTMDet was significantly slower than officially reported . Even though I don't hope so, but I am afraid RTMDet's official benchmark is wrong. That's very critical. Can someone look into it? @hhaAndroid
Hi what are the fps rates for rtmdet models can you please specify, and also can you please give the command for running the benchmark.py
Hi, i have the same issue here https://github.com/open-mmlab/mmdetection/issues/11599, can you please help me? @Vibhuvan
I am using the benchmark script in inference mode, and I am also unable to reproduce the inference speed results for RTMDet models. I am using a fixed 640x640 COCO detection dataset input size to generate a simple plot (note that the mAP values were directly taken from the MMDet model zoo tables).
Something is clearly going on here since RTMDet models are the slowest by a large margin
This is the config I have used to test RTMDet Large:
model = dict(
type='RTMDet',
data_preprocessor=dict(
type='DetDataPreprocessor',
mean=[103.53, 116.28, 123.675],
std=[57.375, 57.12, 58.395],
bgr_to_rgb=False,
batch_augments=None),
backbone=dict(
type='CSPNeXt',
arch='P5',
expand_ratio=0.5,
deepen_factor=1,
widen_factor=1,
channel_attention=True,
norm_cfg=dict(type='SyncBN'),
act_cfg=dict(type='SiLU', inplace=True)),
neck=dict(
type='CSPNeXtPAFPN',
in_channels=[256, 512, 1024],
out_channels=256,
num_csp_blocks=3,
expand_ratio=0.5,
norm_cfg=dict(type='SyncBN'),
act_cfg=dict(type='SiLU', inplace=True)),
bbox_head=dict(
type='RTMDetSepBNHead',
num_classes=80,
in_channels=256,
stacked_convs=2,
feat_channels=256,
anchor_generator=dict(
type='MlvlPointGenerator', offset=0, strides=[8, 16, 32]),
bbox_coder=dict(type='DistancePointBBoxCoder'),
loss_cls=dict(
type='QualityFocalLoss',
use_sigmoid=True,
beta=2.0,
loss_weight=1.0),
loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
with_objectness=False,
exp_on_reg=True,
share_conv=True,
pred_kernel_size=1,
norm_cfg=dict(type='SyncBN'),
act_cfg=dict(type='SiLU', inplace=True)),
train_cfg=dict(
assigner=dict(type='DynamicSoftLabelAssigner', topk=13),
allowed_border=-1,
pos_weight=-1,
debug=False),
test_cfg=dict(
nms_pre=30000,
min_bbox_size=0,
score_thr=0.001,
nms=dict(type='nms', iou_threshold=0.65),
max_per_img=300),
)
env_cfg = dict(
mp_cfg=dict(opencv_num_threads=0, mp_start_method='fork'),
dist_cfg=dict(backend='nccl'))
dataset_type = 'CocoDataset'
data_root = 'coco/'
image_size = (640, 640)
backend_args = None
test_pipeline = [
dict(type='LoadImageFromFile', backend_args=backend_args),
dict(type='Resize', scale=image_size, keep_ratio=False),
dict(
type='PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
test_dataloader = dict(
batch_size=1,
num_workers=2,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='annotations/instances_val2017.json',
data_prefix=dict(img='images/val2017/'),
test_mode=True,
pipeline=test_pipeline,
backend_args=backend_args))
I have opened a new issue with some code that reproduces these findings: https://github.com/open-mmlab/mmdetection/issues/11682