mmpose
mmpose copied to clipboard
[Bug] DEKR model trained on custom dataset results in poor performance
Prerequisite
- [X] I have searched Issues and Discussions but cannot get the expected help.
- [X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmpose).
Environment
python -c "from mmpose.utils import collect_env; print(collect_env())"
OrderedDict([('sys.platform', 'linux'), ('Python', '3.8.18 | packaged by conda-forge | (default, Dec 23 2023, 17:21:28) [GCC 12.3.0]'), ('CUDA available', True), ('numpy_random_seed', 2147483648), ('GPU 0', 'NVIDIA TITAN X (Pascal)'), ('CUDA_HOME', '/usr/local/cuda-11.8'), ('NVCC', 'Cuda compilation tools, release 11.8, V11.8.89'), ('GCC', 'gcc (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0'), ('PyTorch', '2.0.1'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201703\n - Intel(R) oneAPI Math Kernel Library Version 2023.2-Product Build 20230613 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.8\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37\n - CuDNN 8.7\n - Magma 2.6.1\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n'), ('TorchVision', '0.15.2'), ('OpenCV', '4.9.0'), ('MMEngine', '0.10.2'), ('MMPose', '1.3.1+5a3be94')])
pip list | grep mm
comm 0.2.1
diagnostic_common_diagnostics 1.9.7
mmcv 2.1.0
mmdeploy 1.3.1 /home/lmga-titanx/openmmlab/mmdeploy
mmdeploy-runtime 1.3.1
mmdeploy-runtime-gpu 1.3.1
mmdet 3.2.0
mmengine 0.10.2
mmpose 1.3.1 /home/lmga-titanx/openmmlab/mmpose
qt-gui-py-common 0.4.2
rqt_py_common 0.5.3
Reproduces the problem - code sample
My config file:
auto_scale_lr = dict(base_batch_size=10)
backend_args = dict(backend='local')
codec = dict(
decode_max_instances=30,
generate_keypoint_heatmaps=True,
heatmap_size=(
24,
24,
),
input_size=(
96,
96,
),
minimal_diagonal_length=5.656854249492381,
sigma=(
4,
2,
),
type='SPR')
custom_hooks = [
dict(type='SyncBuffersHook'),
]
data_mode = 'bottomup'
data_root = '/home/lmga-titanx/mmpose/data/testing_set/'
dataset_type = 'CocoDataset'
default_hooks = dict(
badcase=dict(
badcase_thr=5,
enable=False,
metric_type='loss',
out_dir='badcase',
type='BadCaseAnalysisHook'),
checkpoint=dict(
interval=10,
rule='greater',
save_best='coco/AP',
type='CheckpointHook'),
logger=dict(interval=50, type='LoggerHook'),
param_scheduler=dict(type='ParamSchedulerHook'),
sampler_seed=dict(type='DistSamplerSeedHook'),
timer=dict(type='IterTimerHook'),
visualization=dict(enable=False, type='PoseVisualizationHook'))
default_scope = 'mmpose'
env_cfg = dict(
cudnn_benchmark=False,
dist_cfg=dict(backend='nccl'),
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
find_unused_parameters = True
launcher = 'none'
load_from = None
log_level = 'INFO'
log_processor = dict(
by_epoch=True, num_digits=6, type='LogProcessor', window_size=50)
model = dict(
backbone=dict(
extra=dict(
stage1=dict(
block='BOTTLENECK',
num_blocks=(4, ),
num_branches=1,
num_channels=(64, ),
num_modules=1),
stage2=dict(
block='BASIC',
num_blocks=(
4,
4,
),
num_branches=2,
num_channels=(
32,
64,
),
num_modules=1),
stage3=dict(
block='BASIC',
num_blocks=(
4,
4,
4,
),
num_branches=3,
num_channels=(
32,
64,
128,
),
num_modules=4),
stage4=dict(
block='BASIC',
multiscale_output=True,
num_blocks=(
4,
4,
4,
4,
),
num_branches=4,
num_channels=(
32,
64,
128,
256,
),
num_modules=3)),
in_channels=3,
init_cfg=dict(
checkpoint=
'https://download.openmmlab.com/mmpose/pretrain_models/hrnet_w32-36af842e.pth',
type='Pretrained'),
type='HRNet'),
data_preprocessor=dict(
bgr_to_rgb=True,
mean=[
123.675,
116.28,
103.53,
],
std=[
58.395,
57.12,
57.375,
],
type='PoseDataPreprocessor'),
head=dict(
decoder=dict(
decode_max_instances=30,
generate_keypoint_heatmaps=True,
heatmap_size=(
24,
24,
),
input_size=(
96,
96,
),
minimal_diagonal_length=5.656854249492381,
sigma=(
4,
2,
),
type='SPR'),
displacement_loss=dict(
beta=0.1111111111111111,
loss_weight=0.002,
supervise_empty=False,
type='SoftWeightSmoothL1Loss',
use_target_weight=True),
heatmap_loss=dict(type='KeypointMSELoss', use_target_weight=True),
in_channels=480,
num_keypoints=2,
type='DEKRHead'),
neck=dict(concat=True, type='FeatureMapProcessor'),
test_cfg=dict(
align_corners=False,
flip_test=True,
multiscale_test=False,
nms_dist_thr=0.05,
shift_heatmap=True),
type='BottomupPoseEstimator')
optim_wrapper = dict(optimizer=dict(lr=0.001, type='Adam'))
param_scheduler = [
dict(
begin=0, by_epoch=False, end=500, start_factor=0.001, type='LinearLR'),
dict(
begin=0,
by_epoch=True,
end=300,
gamma=0.1,
milestones=[
200,
260,
],
type='MultiStepLR'),
]
resume = False
test_cfg = dict()
test_dataloader = dict(
batch_size=1,
dataset=dict(
ann_file='annotations/person_keypoints_valid.json',
data_mode='bottomup',
data_prefix=dict(img='images/'),
data_root='/home/lmga-titanx/mmpose/data/testing_set/',
metainfo=dict(
from_file=
'/home/lmga-titanx/openmmlab/mmpose/configs/_base_/datasets/custom_2.py'
),
pipeline=[
dict(type='LoadImage'),
dict(
input_size=(
96,
96,
),
resize_mode='expand',
size_factor=32,
type='BottomupResize'),
dict(
meta_keys=(
'id',
'img_id',
'img_path',
'crowd_index',
'ori_shape',
'img_shape',
'input_size',
'input_center',
'input_scale',
'flip',
'flip_direction',
'flip_indices',
'raw_ann_info',
'skeleton_links',
),
type='PackPoseInputs'),
],
test_mode=True,
type='CocoDataset'),
drop_last=False,
num_workers=1,
persistent_workers=True,
sampler=dict(round_up=False, shuffle=False, type='DefaultSampler'))
test_evaluator = dict(
ann_file=
'/home/lmga-titanx/mmpose/data/testing_set/annotations/person_keypoints_valid.json',
nms_mode='none',
score_mode='keypoint',
type='CocoMetric')
train_cfg = dict(by_epoch=True, max_epochs=300, val_interval=20)
train_dataloader = dict(
batch_size=10,
dataset=dict(
ann_file='annotations/person_keypoints_train.json',
data_mode='bottomup',
data_prefix=dict(img='images/'),
data_root='/home/lmga-titanx/mmpose/data/testing_set/',
metainfo=dict(
from_file=
'/home/lmga-titanx/openmmlab/mmpose/configs/_base_/datasets/custom_2.py'
),
pipeline=[
dict(type='LoadImage'),
dict(input_size=(
96,
96,
), type='BottomupRandomAffine'),
dict(direction='horizontal', type='RandomFlip'),
dict(
encoder=dict(
decode_max_instances=30,
generate_keypoint_heatmaps=True,
heatmap_size=(
24,
24,
),
input_size=(
96,
96,
),
minimal_diagonal_length=5.656854249492381,
sigma=(
4,
2,
),
type='SPR'),
type='GenerateTarget'),
dict(type='PackPoseInputs'),
],
type='CocoDataset'),
num_workers=2,
persistent_workers=True,
sampler=dict(shuffle=True, type='DefaultSampler'))
train_pipeline = [
dict(type='LoadImage'),
dict(input_size=(
96,
96,
), type='BottomupRandomAffine'),
dict(direction='horizontal', type='RandomFlip'),
dict(
encoder=dict(
decode_max_instances=30,
generate_keypoint_heatmaps=True,
heatmap_size=(
24,
24,
),
input_size=(
96,
96,
),
minimal_diagonal_length=5.656854249492381,
sigma=(
4,
2,
),
type='SPR'),
type='GenerateTarget'),
dict(type='PackPoseInputs'),
]
val_cfg = dict()
val_dataloader = dict(
batch_size=1,
dataset=dict(
ann_file='annotations/person_keypoints_valid.json',
data_mode='bottomup',
data_prefix=dict(img='images/'),
data_root='/home/lmga-titanx/mmpose/data/testing_set/',
metainfo=dict(
from_file=
'/home/lmga-titanx/openmmlab/mmpose/configs/_base_/datasets/custom_2.py'
),
pipeline=[
dict(type='LoadImage'),
dict(
input_size=(
96,
96,
),
resize_mode='expand',
size_factor=32,
type='BottomupResize'),
dict(
meta_keys=(
'id',
'img_id',
'img_path',
'crowd_index',
'ori_shape',
'img_shape',
'input_size',
'input_center',
'input_scale',
'flip',
'flip_direction',
'flip_indices',
'raw_ann_info',
'skeleton_links',
),
type='PackPoseInputs'),
],
test_mode=True,
type='CocoDataset'),
drop_last=False,
num_workers=1,
persistent_workers=True,
sampler=dict(round_up=False, shuffle=False, type='DefaultSampler'))
val_evaluator = dict(
ann_file=
'/home/lmga-titanx/mmpose/data/testing_set/annotations/person_keypoints_valid.json',
nms_mode='none',
score_mode='keypoint',
type='CocoMetric')
val_pipeline = [
dict(type='LoadImage'),
dict(
input_size=(
96,
96,
),
resize_mode='expand',
size_factor=32,
type='BottomupResize'),
dict(
meta_keys=(
'id',
'img_id',
'img_path',
'crowd_index',
'ori_shape',
'img_shape',
'input_size',
'input_center',
'input_scale',
'flip',
'flip_direction',
'flip_indices',
'raw_ann_info',
'skeleton_links',
),
type='PackPoseInputs'),
]
vis_backends = [
dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend'),
]
visualizer = dict(
name='visualizer',
type='PoseLocalVisualizer',
vis_backends=[
dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend'),
])
work_dir = './work_dirs/dekr_hrnet-w32_8xb10-140e_coco-512x512'
mmpose/configs/base/datasets/custom_2.py:
dataset_info = dict(
dataset_name='apple_calyx_coco',
paper_info=dict(
author='Lin, Tsung-Yi and Maire, Michael and '
'Belongie, Serge and Hays, James and '
'Perona, Pietro and Ramanan, Deva and '
r'Doll{\'a}r, Piotr and Zitnick, C Lawrence',
title='Microsoft coco: Common objects in context',
container='European conference on computer vision',
year='2014',
homepage='http://cocodataset.org/',
),
keypoint_info={
0:
dict(name='calyx', id=0, color=[0,0,255], swap=''),
1:
dict(name='stem', id=1, color=[255,0,0], swap='')
},
flip_pairs = [0,1],
flip_index = [0,1],
skeleton_info={},
joint_weights=[1.] * 2,
sigmas=[0.2, 0.2])
Reproduces the problem - command or script
python tools/train.py /home/lmga-titanx/openmmlab/mmpose/configs/apple/DEKR/coco/dekr_hrnet-w32_8xb10-140e_coco-512x512.py
Reproduces the problem - error message
Additional information
Goal: I'm trying to predict 2 keypoints with two classes(meaning one keypoint for each classes) on one instance of object in a single image.
Problem: The AP and AR of the model decreases while the loss decreases, which doesn''t seem right. No key points were predicted with the latest checkpoint. The best checkpoint (300 epoch) was able to predict some keypoints, with very low keypoint scores (keypoint_scores: array([[ 0.01671034, -0.00024851]]).
Extra information: The dataset shouldn't be the problem as the same data was used to trained a Bottom-up Associative Embedding model with the previous 0.x version and the results were good.
It might be helpful to use dataset browser to visualize the data and check if the annotation and label are reasonable. The heatmap size of 24 might be too small for sigma 2 and 4.