mmdetection [Reimplementation] YOLOX bbox pred using t,l,r,b format always produce loss

[Reimplementation] YOLOX bbox pred using t,l,r,b format always produce loss_bbox: 5.0

Open iumyx2612 opened this issue 1 year ago • 13 comments

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version (master) or latest version (3.x).

💬 Describe the reimplementation questions

I replace _bbox_decode function in YOLOX like this:

    def _bbox_decode(self, priors, bbox_preds):
        tl_x = priors[..., 0] - bbox_preds[..., 0]
        tl_y = priors[..., 1] - bbox_preds[..., 1]
        br_x = priors[..., 0] + bbox_preds[..., 2]
        br_y = priors[..., 1] + bbox_preds[..., 3]
        return torch.stack([tl_x, tl_y, br_x, br_y], dim=-1)

My config:

_base_ = [
    '../configs/yolox/yolox_s_8x8_300e_coco.py'
]

# dataset settings
dataset_type = 'CocoDataset'
data_root = 'path'

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    _delete_=True,
    samples_per_gpu=8,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_minitrain2017.json',
        img_prefix=data_root + 'train2017/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline)
)
evaluation = dict(interval=1, metric='bbox')

# optimizer
optimizer = dict(_delete_=True, type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(_delete_=True, grad_clip=None)
# learning policy
lr_config = dict(
    _delete_=True,
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)

My result:

2022-10-25 09:29:29,856 - mmdet - INFO - Epoch [1][50/1564]	lr: 1.978e-03, eta: 1:15:21, time: 0.242, data_time: 0.070, memory: 10625, loss_cls: 0.9370, loss_bbox: 5.0000, loss_obj: 14.5174, loss: 20.4544
2022-10-25 09:29:38,077 - mmdet - INFO - Epoch [1][100/1564]	lr: 3.976e-03, eta: 1:03:09, time: 0.164, data_time: 0.019, memory: 10625, loss_cls: 0.7869, loss_bbox: 5.0000, loss_obj: 7.7109, loss: 13.4978
2022-10-25 09:29:47,287 - mmdet - INFO - Epoch [1][150/1564]	lr: 5.974e-03, eta: 1:01:02, time: 0.184, data_time: 0.018, memory: 10625, loss_cls: 0.6463, loss_bbox: 5.0000, loss_obj: 6.0049, loss: 11.6513
2022-10-25 09:29:55,211 - mmdet - INFO - Epoch [1][200/1564]	lr: 7.972e-03, eta: 0:57:55, time: 0.158, data_time: 0.019, memory: 10625, loss_cls: 0.5865, loss_bbox: 5.0000, loss_obj: 5.3746, loss: 10.9611
2022-10-25 09:30:03,355 - mmdet - INFO - Epoch [1][250/1564]	lr: 9.970e-03, eta: 0:56:15, time: 0.163, data_time: 0.019, memory: 10625, loss_cls: 0.3879, loss_bbox: 5.0000, loss_obj: 5.0617, loss: 10.4496
2022-10-25 09:30:12,697 - mmdet - INFO - Epoch [1][300/1564]	lr: 1.197e-02, eta: 0:56:20, time: 0.187, data_time: 0.017, memory: 10692, loss_cls: 0.3199, loss_bbox: 5.0000, loss_obj: 5.3904, loss: 10.7103

Environment

sys.platform: linux Python: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] CUDA available: True GPU 0,1: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.8, V11.8.89 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 1.10.1 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX512
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.2 OpenCV: 4.6.0 MMCV: 1.5.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.3 MMDetection: 2.25.1+1b4891c

Expected results

Normal YOLOX bbox pred, same config, different bbox_decode function:

2022-10-25 09:30:40,552 - mmdet - INFO - Epoch [1][50/1564]	lr: 1.978e-03, eta: 1:15:08, time: 0.241, data_time: 0.069, memory: 10627, loss_cls: 1.6528, loss_bbox: 4.7309, loss_obj: 12.7018, loss: 19.0855
2022-10-25 09:30:48,810 - mmdet - INFO - Epoch [1][100/1564]	lr: 3.976e-03, eta: 1:03:09, time: 0.165, data_time: 0.020, memory: 10627, loss_cls: 1.8888, loss_bbox: 4.4457, loss_obj: 5.9525, loss: 12.2871
2022-10-25 09:30:58,025 - mmdet - INFO - Epoch [1][150/1564]	lr: 5.974e-03, eta: 1:01:03, time: 0.184, data_time: 0.018, memory: 10627, loss_cls: 2.3753, loss_bbox: 4.0260, loss_obj: 5.8119, loss: 12.2131
2022-10-25 09:31:05,916 - mmdet - INFO - Epoch [1][200/1564]	lr: 7.972e-03, eta: 0:57:52, time: 0.158, data_time: 0.019, memory: 10627, loss_cls: 2.4342, loss_bbox: 3.9371, loss_obj: 5.5452, loss: 11.9165

Additional information

No response

Oct 25 '22 09:10 iumyx2612

@iumyx2612 Why do you want to modify this place? If you want to refer to the FCOS pattern, you should consider *strides

Oct 26 '22 02:10 hhaAndroid

@iumyx2612 Why do you want to modify this place? If you want to refer to the FCOS pattern, you should consider *strides

I want to change the bounding box representation of YOLOX to act like FCOS, but somehow it's not working and I don't know why.

you should consider `*strides

Can you explain more about this?

Oct 26 '22 02:10 iumyx2612

If this can be helpful, I wrote a script to visualize the label assignment process of simOTA.
I ran this on a test dataset here: https://public.roboflow.com/object-detection/synthetic-fruit
This is YOLOX using new _bbox_decode function, which decode like FCOS: iter 1:
49_jpg rf 003d87e7cc130a27308b3788502b8cf00

iter 11: 49_jpg rf 003d87e7cc130a27308b3788502b8cf010

iter 31: 49_jpg rf 003d87e7cc130a27308b3788502b8cf030

The bigger the dot, the higher the feature pyramid scale it belongs to (bigger stride)

And this is normal YOLOX with normal _bbox_decode:

iter 1: 49_jpg rf 003d87e7cc130a27308b3788502b8cf00

iter 11: 49_jpg rf 003d87e7cc130a27308b3788502b8cf010

iter 31: 49_jpg rf 003d87e7cc130a27308b3788502b8cf030

Oct 26 '22 03:10 iumyx2612

@iumyx2612 Thank you very much for your feedback. Are all the implementation details included in the issue? I'll debug it when I'm free

Nov 01 '22 02:11 hhaAndroid

@iumyx2612 Thank you very much for your feedback. Are all the implementation details included in the issue? I'll debug it when I'm free

Yes, I only modified _bbox_decode of YOLOXHead.
Also, when training, the config dataset, data pipeline, optimizer, scheduler, etc. is the same with _base_ config (ie. img_size when resize, not apply Mosaic,...)
If you need the code for simOTA visualization, I'm willing to share too

Nov 01 '22 06:11 iumyx2612

close or open 'with_stride' in get priors function may help you

Nov 02 '22 13:11 Co4AI

close or open 'with_stride' in get priors function may help you

I keep with_stride, since SimOTAAssigner requires priors to have format [cx, xy, stride_w, stride_y]

The only difference I made is on line 381: https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/dense_heads/yolox_head.py#L381
which I implement another _bbox_decode function, like FCOS

Nov 03 '22 02:11 iumyx2612

Do you miss exp(box_pred) after conv? This is done in FCOS' forward but YOLOX' post processing.

Nov 03 '22 03:11 Co4AI

Do you miss exp(box_pred) after conv? This is done in FCOS' forward but YOLOX' post processing.

I don't think it plays an important role here? Since exp(bbox_pred) is only used on FCOS without norm_on_bbox, FCOS with norm_on_bbox directly normalizes bbox_pred with strides instead of using `exp. If I'm wrong please correct me

Nov 03 '22 03:11 iumyx2612

FCOS use different loss for different box-encoding settings.

Nov 03 '22 09:11 Co4AI

@hhaAndroid please notify me when you starting working on this :bowing_man:

Nov 11 '22 07:11 iumyx2612

@iumyx2612非常感谢您的反馈。问题中是否包含所有实施细节？我会在有空的时候调试它

是的，我只修改_bbox_decode了 YOLOXHead。此外，在训练时，配置数据集、数据管道、优化器、调度器等与_base_配置相同（即调整大小时的 img_size，不应用马赛克，...）如果您需要 simOTA 可视化的代码，我是也愿意分享

@iumyx2612 Thank you very much for your feedback. Are all the implementation details included in the issue? I'll debug it when I'm free

Yes, I only modified _bbox_decode of YOLOXHead. Also, when training, the config dataset, data pipeline, optimizer, scheduler, etc. is the same with _base_ config (ie. img_size when resize, not apply Mosaic,...) If you need the code for simOTA visualization, I'm willing to share too

Hi I want get the code for simOTA visualization,can you share it with me, thanks

Dec 21 '22 05:12 Ironbox1004

@iumyx2612非常感谢您的反馈。问题中是否包含所有实施细节？我会在有空的时候调试它

是的，我只修改_bbox_decode了 YOLOXHead。此外，在训练时，配置数据集、数据管道、优化器、调度器等与_base_配置相同（即调整大小时的 img_size，不应用马赛克，...）如果您需要 simOTA 可视化的代码，我是也愿意分享

@iumyx2612 Thank you very much for your feedback. Are all the implementation details included in the issue? I'll debug it when I'm free

Yes, I only modified _bbox_decode of YOLOXHead. Also, when training, the config dataset, data pipeline, optimizer, scheduler, etc. is the same with _base_ config (ie. img_size when resize, not apply Mosaic,...) If you need the code for simOTA visualization, I'm willing to share too

Hi I want get the code for simOTA visualization,can you share it with me, thanks

Hello, I implement the simOTA visualization as a Hook. You can check it out here: https://github.com/main-2983/sun-det/blob/end-to-end/mmdet/core/hook/base_label_assignment_vis_hook.py https://github.com/main-2983/sun-det/blob/end-to-end/mmdet/core/hook/base_simOTA_vis_hook.py

Dec 26 '22 06:12 iumyx2612

mmdetection mmdetection copied to clipboard

[Reimplementation] YOLOX bbox pred using t,l,r,b format always produce loss_bbox: 5.0

Prerequisite

💬 Describe the reimplementation questions

Environment

Expected results

Additional information

mmdetection
mmdetection copied to clipboard