mmpose
mmpose copied to clipboard
Why the DEKR method model only works correctly in square input?
Prerequisite
- [X] I have searched Issues and Discussions but cannot get the expected help.
- [X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmpose).
Environment
mmcv 2.0.1 mmdet 3.0.0 mmengine 0.8.4 mmpose 1.1.0 mmpretrain 1.0.2
Reproduces the problem - code sample
_base_ = ['../../../_base_/default_runtime.py']
# runtime
train_cfg = dict(max_epochs=200, val_interval=10)
# optimizer
optim_wrapper = dict(optimizer=dict(
type='Adam',
lr=1e-3,
))
# learning policy
param_scheduler = [
dict(
type='LinearLR', begin=0, end=500, start_factor=0.001,
by_epoch=False), # warm-up
dict(
type='MultiStepLR',
begin=0,
end=140,
milestones=[90, 120],
gamma=0.1,
by_epoch=True)
]
# automatically scaling LR based on the actual training batch size
auto_scale_lr = dict(base_batch_size=80)
# hooks
default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater'))
# codec settings
codec = dict(
type='SPR',
input_size=(192, 256),
heatmap_size=(48, 64),
sigma=(4, 2),
minimal_diagonal_length=32**0.5,
generate_keypoint_heatmaps=True,
decode_max_instances=30)
# model settings
model = dict(
type='BottomupPoseEstimator',
data_preprocessor=dict(
type='PoseDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True),
backbone=dict(
type='HRNet',
in_channels=3,
extra=dict(
stage1=dict(
num_modules=1,
num_branches=1,
block='BOTTLENECK',
num_blocks=(4, ),
num_channels=(64, )),
stage2=dict(
num_modules=1,
num_branches=2,
block='BASIC',
num_blocks=(4, 4),
num_channels=(32, 64)),
stage3=dict(
num_modules=4,
num_branches=3,
block='BASIC',
num_blocks=(4, 4, 4),
num_channels=(32, 64, 128)),
stage4=dict(
num_modules=3,
num_branches=4,
block='BASIC',
num_blocks=(4, 4, 4, 4),
num_channels=(32, 64, 128, 256),
multiscale_output=True)),
init_cfg=dict(
type='Pretrained',
checkpoint='/data/home/seondeok/Project/acupoint/mmpose/configs/body_2d_keypoint/pretrain/dekr/dekr_hrnet-w32_8xb10-140e_coco-512x512_ac7c17bf-20221228.pth'),
),
neck=dict(
type='FeatureMapProcessor',
concat=True,
),
head=dict(
type='DEKRHead',
in_channels=480,
num_keypoints=5, # edit
heatmap_loss=dict(type='KeypointMSELoss', use_target_weight=True),
displacement_loss=dict(
type='SoftWeightSmoothL1Loss',
use_target_weight=True,
supervise_empty=False,
beta=1 / 9,
loss_weight=0.002,
),
decoder=codec,
# rescore_cfg=dict(
# in_channels=74,
# norm_indexes=(5, 6),
# init_cfg=dict(
# type='Pretrained',
# checkpoint='https://download.openmmlab.com/mmpose/'
# 'pretrain_models/kpt_rescore_coco-33d58c5c.pth')),
),
test_cfg=dict(
multiscale_test=False,
flip_test=True,
nms_dist_thr=0.05,
shift_heatmap=True,
align_corners=False))
# enable DDP training when rescore net is used
find_unused_parameters = True
# base dataset settings
dataset_type = 'CocoArm'
data_mode = 'topdown'
data_root = '/data/home/seondeok/Project/acupoint/coco_dataset/arm/PK_Train_1462/'
test_root = '/data/home/seondeok/Project/acupoint/coco_dataset/arm/PK_Test_374/'
annotation_root = '/data/home/seondeok/Project/acupoint/coco_dataset/arm/json/PK_Train_1462.json'
annotation_root_val = '/data/home/seondeok/Project/acupoint/coco_dataset/arm/json/PK_Test_374.json'
# pipelines
train_pipeline = [
dict(type='LoadImage'),
dict(type='BottomupRandomAffine', input_size=codec['input_size']),
dict(type='RandomFlip', direction='horizontal'),
dict(type='GenerateTarget', encoder=codec),
dict(type='BottomupGetHeatmapMask'),
dict(type='PackPoseInputs'),
]
val_pipeline = [
dict(type='LoadImage'),
dict(
type='BottomupResize',
input_size=codec['input_size'],
size_factor=32,
resize_mode='expand'),
dict(
type='PackPoseInputs',
meta_keys=('id', 'img_id', 'img_path', 'crowd_index', 'ori_shape',
'img_shape', 'input_size', 'input_center', 'input_scale',
'flip', 'flip_direction', 'flip_indices', 'raw_ann_info',
'skeleton_links'))
]
# data loaders
train_dataloader = dict(
batch_size=10,
num_workers=2,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_mode=data_mode,
ann_file=annotation_root,
data_prefix=dict(img=data_root),
pipeline=train_pipeline,
))
val_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
dataset=dict(
type=dataset_type,
data_root=test_root,
data_mode=data_mode,
ann_file=annotation_root_val,
data_prefix=dict(img=test_root),
test_mode=True,
pipeline=val_pipeline,
))
test_dataloader = val_dataloader
# evaluators
val_evaluator = dict(
type='CocoMetric',
ann_file=annotation_root_val,
nms_mode='none',
score_mode='keypoint',
)
test_evaluator = val_evaluator
Reproduces the problem - command or script
python tools/train.py configs/body_2d_keypoint/dekr/custom_coco/dekr_hrnet-w32_8xb10-140e_coco-256x192.py
Reproduces the problem - error message
The train.py code works with not error, but after finishing train when I use the weight to infer, the performance was worst. Compare to square input, the keypoint detection didn't work good. When I use this model with default input size 256x256, this model works well, but I have a question. Why the DEKR, bottomupPoseEstimation method didn't work well with rectangle input size? Unlike Top down models, why bottomupPoseEstimation model didn't works well with 256x192 or other rectangular resize?
Additional information
No response
The test-time transform called BottomupResize cannot guarantee that the input image will be square. Instead, it typically resizes the image so that its shortest edge is the same length as the specified width and height of input_size. If the width and height of input_size are different, it could cause problems in this process based on the following code snippet.
https://github.com/open-mmlab/mmpose/blob/efe09cd5268d4d6b21100334fbf2947ef36fc7db/mmpose/datasets/transforms/bottomup_transforms.py#L521-L530
Thank you for your answer!
The test-time transform called
BottomupResizecannot guarantee that the input image will be square. Instead, it typically resizes the image so that its shortest edge is the same length as the specified width and height ofinput_size. If the width and height ofinput_sizeare different, it could cause problems in this process based on the following code snippet.https://github.com/open-mmlab/mmpose/blob/efe09cd5268d4d6b21100334fbf2947ef36fc7db/mmpose/datasets/transforms/bottomup_transforms.py#L521-L530
So, is there any way to change which part of the code makes it possible to get good test results even with rectangular input sizes? My custom dataset is 1920×1080 resolution and the test results for the DEKR model are poor.
So, is there any way to change which part of the code makes it possible to get good test results even with rectangular input sizes? My custom dataset is 1920×1080 resolution and the test results for the DEKR model are poor.
You can try to set the input_size to (1080, 1080). During training, the image will be randomly resized and cropped to 1080x1080. During inference, the image will be resized to 1920x1080.
So, is there any way to change which part of the code makes it possible to get good test results even with rectangular input sizes? My custom dataset is 1920×1080 resolution and the test results for the DEKR model are poor.
You can try to set the
input_sizeto (1080, 1080). During training, the image will be randomly resized and cropped to 1080x1080. During inference, the image will be resized to 1920x1080.
Thank you very much! I will try it.