mmdetection
mmdetection copied to clipboard
[Reimplementation] YOLOX bbox pred using t,l,r,b format always produce loss_bbox: 5.0
Prerequisite
- [X] I have searched Issues and Discussions but cannot get the expected help.
- [X] I have read the FAQ documentation but cannot get the expected help.
- [X] The bug has not been fixed in the latest version (master) or latest version (3.x).
💬 Describe the reimplementation questions
I replace _bbox_decode
function in YOLOX like this:
def _bbox_decode(self, priors, bbox_preds):
tl_x = priors[..., 0] - bbox_preds[..., 0]
tl_y = priors[..., 1] - bbox_preds[..., 1]
br_x = priors[..., 0] + bbox_preds[..., 2]
br_y = priors[..., 1] + bbox_preds[..., 3]
return torch.stack([tl_x, tl_y, br_x, br_y], dim=-1)
My config:
_base_ = [
'../configs/yolox/yolox_s_8x8_300e_coco.py'
]
# dataset settings
dataset_type = 'CocoDataset'
data_root = 'path'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
_delete_=True,
samples_per_gpu=8,
workers_per_gpu=4,
train=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_minitrain2017.json',
img_prefix=data_root + 'train2017/',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline)
)
evaluation = dict(interval=1, metric='bbox')
# optimizer
optimizer = dict(_delete_=True, type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(_delete_=True, grad_clip=None)
# learning policy
lr_config = dict(
_delete_=True,
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
My result:
2022-10-25 09:29:29,856 - mmdet - INFO - Epoch [1][50/1564] lr: 1.978e-03, eta: 1:15:21, time: 0.242, data_time: 0.070, memory: 10625, loss_cls: 0.9370, loss_bbox: 5.0000, loss_obj: 14.5174, loss: 20.4544
2022-10-25 09:29:38,077 - mmdet - INFO - Epoch [1][100/1564] lr: 3.976e-03, eta: 1:03:09, time: 0.164, data_time: 0.019, memory: 10625, loss_cls: 0.7869, loss_bbox: 5.0000, loss_obj: 7.7109, loss: 13.4978
2022-10-25 09:29:47,287 - mmdet - INFO - Epoch [1][150/1564] lr: 5.974e-03, eta: 1:01:02, time: 0.184, data_time: 0.018, memory: 10625, loss_cls: 0.6463, loss_bbox: 5.0000, loss_obj: 6.0049, loss: 11.6513
2022-10-25 09:29:55,211 - mmdet - INFO - Epoch [1][200/1564] lr: 7.972e-03, eta: 0:57:55, time: 0.158, data_time: 0.019, memory: 10625, loss_cls: 0.5865, loss_bbox: 5.0000, loss_obj: 5.3746, loss: 10.9611
2022-10-25 09:30:03,355 - mmdet - INFO - Epoch [1][250/1564] lr: 9.970e-03, eta: 0:56:15, time: 0.163, data_time: 0.019, memory: 10625, loss_cls: 0.3879, loss_bbox: 5.0000, loss_obj: 5.0617, loss: 10.4496
2022-10-25 09:30:12,697 - mmdet - INFO - Epoch [1][300/1564] lr: 1.197e-02, eta: 0:56:20, time: 0.187, data_time: 0.017, memory: 10692, loss_cls: 0.3199, loss_bbox: 5.0000, loss_obj: 5.3904, loss: 10.7103
Environment
sys.platform: linux Python: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] CUDA available: True GPU 0,1: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.8, V11.8.89 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 1.10.1 PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 11.3
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.2
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.11.2 OpenCV: 4.6.0 MMCV: 1.5.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.3 MMDetection: 2.25.1+1b4891c
Expected results
Normal YOLOX bbox pred, same config, different bbox_decode
function:
2022-10-25 09:30:40,552 - mmdet - INFO - Epoch [1][50/1564] lr: 1.978e-03, eta: 1:15:08, time: 0.241, data_time: 0.069, memory: 10627, loss_cls: 1.6528, loss_bbox: 4.7309, loss_obj: 12.7018, loss: 19.0855
2022-10-25 09:30:48,810 - mmdet - INFO - Epoch [1][100/1564] lr: 3.976e-03, eta: 1:03:09, time: 0.165, data_time: 0.020, memory: 10627, loss_cls: 1.8888, loss_bbox: 4.4457, loss_obj: 5.9525, loss: 12.2871
2022-10-25 09:30:58,025 - mmdet - INFO - Epoch [1][150/1564] lr: 5.974e-03, eta: 1:01:03, time: 0.184, data_time: 0.018, memory: 10627, loss_cls: 2.3753, loss_bbox: 4.0260, loss_obj: 5.8119, loss: 12.2131
2022-10-25 09:31:05,916 - mmdet - INFO - Epoch [1][200/1564] lr: 7.972e-03, eta: 0:57:52, time: 0.158, data_time: 0.019, memory: 10627, loss_cls: 2.4342, loss_bbox: 3.9371, loss_obj: 5.5452, loss: 11.9165
Additional information
No response
@iumyx2612 Why do you want to modify this place? If you want to refer to the FCOS pattern, you should consider *strides
@iumyx2612 Why do you want to modify this place? If you want to refer to the FCOS pattern, you should consider
*strides
I want to change the bounding box representation of YOLOX to act like FCOS, but somehow it's not working and I don't know why.
you should consider `*strides
Can you explain more about this?
If this can be helpful, I wrote a script to visualize the label assignment process of simOTA.
I ran this on a test dataset here: https://public.roboflow.com/object-detection/synthetic-fruit
This is YOLOX using new _bbox_decode
function, which decode like FCOS:
iter 1:
iter 11:
iter 31:
The bigger the dot, the higher the feature pyramid scale it belongs to (bigger stride)
And this is normal YOLOX with normal _bbox_decode
:
iter 1:
iter 11:
iter 31:
@iumyx2612 Thank you very much for your feedback. Are all the implementation details included in the issue? I'll debug it when I'm free
@iumyx2612 Thank you very much for your feedback. Are all the implementation details included in the issue? I'll debug it when I'm free
Yes, I only modified _bbox_decode
of YOLOXHead.
Also, when training, the config dataset, data pipeline, optimizer, scheduler, etc. is the same with _base_
config (ie. img_size when resize, not apply Mosaic,...)
If you need the code for simOTA visualization, I'm willing to share too
close or open 'with_stride' in get priors function may help you
close or open 'with_stride' in get priors function may help you
I keep with_stride
, since SimOTAAssigner requires priors to have format [cx, xy, stride_w, stride_y]
The only difference I made is on line 381: https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/dense_heads/yolox_head.py#L381
which I implement another _bbox_decode
function, like FCOS
Do you miss exp(box_pred) after conv? This is done in FCOS' forward but YOLOX' post processing.
Do you miss exp(box_pred) after conv? This is done in FCOS' forward but YOLOX' post processing.
I don't think it plays an important role here? Since exp(bbox_pred)
is only used on FCOS without norm_on_bbox
, FCOS with norm_on_bbox
directly normalizes bbox_pred
with strides
instead of using `exp. If I'm wrong please correct me
FCOS use different loss for different box-encoding settings.
@hhaAndroid please notify me when you starting working on this :bowing_man:
@iumyx2612非常感谢您的反馈。问题中是否包含所有实施细节?我会在有空的时候调试它
是的,我只修改
_bbox_decode
了 YOLOXHead。 此外,在训练时,配置数据集、数据管道、优化器、调度器等与_base_
配置相同(即调整大小时的 img_size,不应用马赛克,...) 如果您需要 simOTA 可视化的代码,我是也愿意分享
@iumyx2612 Thank you very much for your feedback. Are all the implementation details included in the issue? I'll debug it when I'm free
Yes, I only modified
_bbox_decode
of YOLOXHead. Also, when training, the config dataset, data pipeline, optimizer, scheduler, etc. is the same with_base_
config (ie. img_size when resize, not apply Mosaic,...) If you need the code for simOTA visualization, I'm willing to share too
Hi I want get the code for simOTA visualization,can you share it with me, thanks
@iumyx2612非常感谢您的反馈。问题中是否包含所有实施细节?我会在有空的时候调试它
是的,我只修改
_bbox_decode
了 YOLOXHead。 此外,在训练时,配置数据集、数据管道、优化器、调度器等与_base_
配置相同(即调整大小时的 img_size,不应用马赛克,...) 如果您需要 simOTA 可视化的代码,我是也愿意分享@iumyx2612 Thank you very much for your feedback. Are all the implementation details included in the issue? I'll debug it when I'm free
Yes, I only modified
_bbox_decode
of YOLOXHead. Also, when training, the config dataset, data pipeline, optimizer, scheduler, etc. is the same with_base_
config (ie. img_size when resize, not apply Mosaic,...) If you need the code for simOTA visualization, I'm willing to share tooHi I want get the code for simOTA visualization,can you share it with me, thanks
Hello, I implement the simOTA visualization as a Hook. You can check it out here: https://github.com/main-2983/sun-det/blob/end-to-end/mmdet/core/hook/base_label_assignment_vis_hook.py https://github.com/main-2983/sun-det/blob/end-to-end/mmdet/core/hook/base_simOTA_vis_hook.py