mmpose
mmpose copied to clipboard
[Docs] Unclear Documentation on Multi-Class Output
📚 The doc issue
Hello,
I'm trying to get multi-class output working for the animalpose dataset. I'm working with a slightly adapted config of td-hm_res50_8xb64-210e_animalpose-256x256.py shown below:
_base_ = ['../../../_base_/default_runtime.py']
# runtime
train_cfg = dict(max_epochs=210, val_interval=10)
# optimizer
optim_wrapper = dict(optimizer=dict(
type='Adam',
lr=5e-4,
))
# learning policy
param_scheduler = [
dict(
type='LinearLR', begin=0, end=500, start_factor=0.001,
by_epoch=False), # warm-up
dict(
type='MultiStepLR',
begin=0,
end=100,
milestones=[170, 200],
gamma=0.1,
by_epoch=True)
]
# automatically scaling LR based on the actual training batch size
auto_scale_lr = dict(base_batch_size=512)
# hooks
default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater'))
# codec settings
codec = dict(
type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2)
# model settings
model = dict(
type='TopdownPoseEstimator',
data_preprocessor=dict(
type='PoseDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True),
backbone=dict(
type='ResNet',
depth=50,
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
),
head=dict(
type='HeatmapHead',
in_channels=2048,
out_channels=20,
loss=dict(type='KeypointMSELoss', use_target_weight=True),
decoder=codec),
test_cfg=dict(
flip_test=True,
flip_mode='heatmap',
shift_heatmap=True,
))
# base dataset settings
dataset_type = 'AnimalPoseDataset'
data_mode = 'topdown'
data_root = '../assets'
# pipelines
train_pipeline = [
dict(type='LoadImage'),
dict(type='GetBBoxCenterScale'),
dict(type='RandomFlip', direction='horizontal'),
dict(type='RandomHalfBody'),
dict(type='RandomBBoxTransform'),
dict(type='TopdownAffine', input_size=codec['input_size']),
dict(type='GenerateTarget', encoder=codec),
dict(type='PackPoseInputs')
]
val_pipeline = [
dict(type='LoadImage'),
dict(type='GetBBoxCenterScale'),
dict(type='TopdownAffine', input_size=codec['input_size']),
dict(type='PackPoseInputs')
]
# data loaders
train_dataloader = dict(
batch_size=64,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type=dataset_type,
data_root=f'{data_root}/train',
data_mode=data_mode,
ann_file='annotations/train.json',
data_prefix=dict(img='images/'),
metainfo=dict(from_file='configs/_base_/datasets/animalpose.py'),
pipeline=train_pipeline,
))
val_dataloader = dict(
batch_size=16,
num_workers=4,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
dataset=dict(
type=dataset_type,
data_root=f'{data_root}/val',
data_mode=data_mode,
ann_file='annotations/val.json',
data_prefix=dict(img='images/'),
metainfo=dict(from_file='configs/_base_/datasets/animalpose.py'),
test_mode=False,
pipeline=val_pipeline,
))
test_dataloader = val_dataloader
# evaluators
val_evaluator = [
dict(type='PCKAccuracy', thr=0.2),
dict(type='AUC'),
dict(type='EPE'),
]
test_evaluator = val_evaluator
# Results
results_root = '../results'
work_dir = results_root + '/td-hm_res50_8xb64-100e_animalpose-256x256'
After training is completed, I load the checkpoint and run inferencing:
from mmpose.apis import init_model, inference_topdown
model = init_model(config_file, checkpoint_file, device='cpu')
# Read image
image = cv2.imread(image_path)
# Run inference
results = inference_topdown(model, image)
print("results = ", results)
The following output is read, which doesn't have any information on the actual class of the instance that was predicted. The animalpose dataset has 5 different bounding box classes (cow, horse, sheep, cat, dog). But this information isn't available anywhere in these results.
results = [<PoseDataSample(
META INFORMATION
input_center: array([150. , 149.5], dtype=float32)
input_size: (256, 256)
img_shape: (299, 300)
input_scale: array([375., 375.], dtype=float32)
batch_input_shape: (256, 256)
dataset_name: 'animalpose'
flip_indices: [1, 0, 3, 2, 4, 5, 6, 7, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18]
ori_shape: (299, 300)
img_path: None
pad_shape: (256, 256)
DATA FIELDS
gt_instances: <InstanceData(
META INFORMATION
DATA FIELDS
bboxes: array([[ 0., 0., 300., 299.]], dtype=float32)
bbox_scores: array([1.], dtype=float32)
bbox_scales: array([[375., 375.]], dtype=float32)
) at 0x7fff29849190>
pred_instances: <InstanceData(
META INFORMATION
DATA FIELDS
bboxes: array([[ 0., 0., 300., 299.]], dtype=float32)
keypoint_scores: array([[0.9160682 , 0.9097917 , 0.6466743 , 0.75962734, 0.67032546,
0.11103424, 0.56707996, 0.41349667, 0.36413836, 0.07535622,
0.4074468 , 0.4906253 , 0.47876698, 0.11326428, 0.3080265 ,
0.74965405, 0.29342115, 0.3382476 , 0.4011271 , 0.360986 ]],
dtype=float32)
keypoints_visible: array([[0.9160682 , 0.9097917 , 0.6466743 , 0.75962734, 0.67032546,
0.11103424, 0.56707996, 0.41349667, 0.36413836, 0.07535622,
0.4074468 , 0.4906253 , 0.47876698, 0.11326428, 0.3080265 ,
0.74965405, 0.29342115, 0.3382476 , 0.4011271 , 0.360986 ]],
dtype=float32)
keypoints: array([[[262.79297 , 162.6836 ],
[201.26953 , 156.82422 ],
[239.35547 , 224.20703 ],
[268.65234 , 89.44141 ],
[177.83203 , 104.08984 ],
[110.44922 , 124.59766 ],
[113.37891 , 112.87891 ],
[ 48.92578 , 98.23047 ],
[ 28.417969 , 74.79297 ],
[277.4414 , 227.13672 ],
[160.2539 , 183.1914 ],
[ 34.277344 , 112.87891 ],
[ 7.9101562, 112.87891 ],
[201.26953 , 241.78516 ],
[177.83203 , 227.13672 ],
[ 54.785156 , 127.52734 ],
[ 25.488281 , 136.3164 ],
[154.39453 , 139.2461 ],
[177.83203 , 36.70703 ],
[ 13.769531 , 42.566406 ]]], dtype=float32)
bbox_scores: array([1.], dtype=float32)
) at 0x7fff34642110>
gt_instance_labels: <InstanceData(
META INFORMATION
DATA FIELDS
) at 0x7fff29849cd0>
) at 0x7fff34642910>]
When trying to learn more about this I assumed that I would need to alter the head of the config in some way to add class outputs to the bounding boxes in some way. Alas, this section of the documentation does not appear to be filled in yet. On a higher level, I'm also interested in how one could instantiate a custom dataset which would have multiple classes, where each class had a different set of keypoints. The deepfashion full config seems to be the closest thing to what I'm interested in, but the config is relatively simple. In animalpose, this isn't so important since each class has the same number of keypoints and mapping, but I'd eventually like to implement a model with a number of classes with varying keypoints.
Thanks!
Suggest a potential alternative/fix
No response