mmpose [Docs] Unclear Documentation on Multi-Class Output

[Docs] Unclear Documentation on Multi-Class Output

Open perry-smartsizer opened this issue 1 year ago • 0 comments

📚 The doc issue

Hello,

I'm trying to get multi-class output working for the animalpose dataset. I'm working with a slightly adapted config of td-hm_res50_8xb64-210e_animalpose-256x256.py shown below:

_base_ = ['../../../_base_/default_runtime.py']

# runtime
train_cfg = dict(max_epochs=210, val_interval=10)

# optimizer
optim_wrapper = dict(optimizer=dict(
    type='Adam',
    lr=5e-4,
))

# learning policy
param_scheduler = [
    dict(
        type='LinearLR', begin=0, end=500, start_factor=0.001,
        by_epoch=False),  # warm-up
    dict(
        type='MultiStepLR',
        begin=0,
        end=100,
        milestones=[170, 200],
        gamma=0.1,
        by_epoch=True)
]

# automatically scaling LR based on the actual training batch size
auto_scale_lr = dict(base_batch_size=512)

# hooks
default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater'))

# codec settings
codec = dict(
    type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2)

# model settings
model = dict(
    type='TopdownPoseEstimator',
    data_preprocessor=dict(
        type='PoseDataPreprocessor',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        bgr_to_rgb=True),
    backbone=dict(
        type='ResNet',
        depth=50,
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
    ),
    head=dict(
        type='HeatmapHead',
        in_channels=2048,
        out_channels=20,
        loss=dict(type='KeypointMSELoss', use_target_weight=True),
        decoder=codec),
    test_cfg=dict(
        flip_test=True,
        flip_mode='heatmap',
        shift_heatmap=True,
    ))

# base dataset settings
dataset_type = 'AnimalPoseDataset'
data_mode = 'topdown'
data_root = '../assets'

# pipelines
train_pipeline = [
    dict(type='LoadImage'),
    dict(type='GetBBoxCenterScale'),
    dict(type='RandomFlip', direction='horizontal'),
    dict(type='RandomHalfBody'),
    dict(type='RandomBBoxTransform'),
    dict(type='TopdownAffine', input_size=codec['input_size']),
    dict(type='GenerateTarget', encoder=codec),
    dict(type='PackPoseInputs')
]
val_pipeline = [
    dict(type='LoadImage'),
    dict(type='GetBBoxCenterScale'),
    dict(type='TopdownAffine', input_size=codec['input_size']),
    dict(type='PackPoseInputs')
]

# data loaders
train_dataloader = dict(
    batch_size=64,
    num_workers=4,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=True),
    dataset=dict(
        type=dataset_type,
        data_root=f'{data_root}/train', 
        data_mode=data_mode,
        ann_file='annotations/train.json',
        data_prefix=dict(img='images/'),  
        metainfo=dict(from_file='configs/_base_/datasets/animalpose.py'),
        pipeline=train_pipeline,
    ))
val_dataloader = dict(
    batch_size=16,
    num_workers=4,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
    dataset=dict(
        type=dataset_type,
        data_root=f'{data_root}/val', 
        data_mode=data_mode,
        ann_file='annotations/val.json',
        data_prefix=dict(img='images/'),
        metainfo=dict(from_file='configs/_base_/datasets/animalpose.py'),
        test_mode=False,
        pipeline=val_pipeline,
    ))
test_dataloader = val_dataloader

# evaluators
val_evaluator = [
    dict(type='PCKAccuracy', thr=0.2),
    dict(type='AUC'),
    dict(type='EPE'),
]
test_evaluator = val_evaluator

# Results 
results_root = '../results'

work_dir = results_root + '/td-hm_res50_8xb64-100e_animalpose-256x256'

After training is completed, I load the checkpoint and run inferencing:

from mmpose.apis import init_model, inference_topdown

model = init_model(config_file, checkpoint_file, device='cpu')

# Read image
image = cv2.imread(image_path)

# Run inference
results = inference_topdown(model, image)

print("results = ", results)

The following output is read, which doesn't have any information on the actual class of the instance that was predicted. The animalpose dataset has 5 different bounding box classes (cow, horse, sheep, cat, dog). But this information isn't available anywhere in these results.

results =  [<PoseDataSample(

    META INFORMATION
    input_center: array([150. , 149.5], dtype=float32)
    input_size: (256, 256)
    img_shape: (299, 300)
    input_scale: array([375., 375.], dtype=float32)
    batch_input_shape: (256, 256)
    dataset_name: 'animalpose'
    flip_indices: [1, 0, 3, 2, 4, 5, 6, 7, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18]
    ori_shape: (299, 300)
    img_path: None
    pad_shape: (256, 256)

    DATA FIELDS
    gt_instances: <InstanceData(
        
            META INFORMATION
        
            DATA FIELDS
            bboxes: array([[  0.,   0., 300., 299.]], dtype=float32)
            bbox_scores: array([1.], dtype=float32)
            bbox_scales: array([[375., 375.]], dtype=float32)
        ) at 0x7fff29849190>
    pred_instances: <InstanceData(
        
            META INFORMATION
        
            DATA FIELDS
            bboxes: array([[  0.,   0., 300., 299.]], dtype=float32)
            keypoint_scores: array([[0.9160682 , 0.9097917 , 0.6466743 , 0.75962734, 0.67032546,
                        0.11103424, 0.56707996, 0.41349667, 0.36413836, 0.07535622,
                        0.4074468 , 0.4906253 , 0.47876698, 0.11326428, 0.3080265 ,
                        0.74965405, 0.29342115, 0.3382476 , 0.4011271 , 0.360986  ]],
                      dtype=float32)
            keypoints_visible: array([[0.9160682 , 0.9097917 , 0.6466743 , 0.75962734, 0.67032546,
                        0.11103424, 0.56707996, 0.41349667, 0.36413836, 0.07535622,
                        0.4074468 , 0.4906253 , 0.47876698, 0.11326428, 0.3080265 ,
                        0.74965405, 0.29342115, 0.3382476 , 0.4011271 , 0.360986  ]],
                      dtype=float32)
            keypoints: array([[[262.79297  , 162.6836   ],
                        [201.26953  , 156.82422  ],
                        [239.35547  , 224.20703  ],
                        [268.65234  ,  89.44141  ],
                        [177.83203  , 104.08984  ],
                        [110.44922  , 124.59766  ],
                        [113.37891  , 112.87891  ],
                        [ 48.92578  ,  98.23047  ],
                        [ 28.417969 ,  74.79297  ],
                        [277.4414   , 227.13672  ],
                        [160.2539   , 183.1914   ],
                        [ 34.277344 , 112.87891  ],
                        [  7.9101562, 112.87891  ],
                        [201.26953  , 241.78516  ],
                        [177.83203  , 227.13672  ],
                        [ 54.785156 , 127.52734  ],
                        [ 25.488281 , 136.3164   ],
                        [154.39453  , 139.2461   ],
                        [177.83203  ,  36.70703  ],
                        [ 13.769531 ,  42.566406 ]]], dtype=float32)
            bbox_scores: array([1.], dtype=float32)
        ) at 0x7fff34642110>
    gt_instance_labels: <InstanceData(
        
            META INFORMATION
        
            DATA FIELDS
        ) at 0x7fff29849cd0>
) at 0x7fff34642910>]

When trying to learn more about this I assumed that I would need to alter the head of the config in some way to add class outputs to the bounding boxes in some way. Alas, this section of the documentation does not appear to be filled in yet. On a higher level, I'm also interested in how one could instantiate a custom dataset which would have multiple classes, where each class had a different set of keypoints. The deepfashion full config seems to be the closest thing to what I'm interested in, but the config is relatively simple. In animalpose, this isn't so important since each class has the same number of keypoints and mapping, but I'd eventually like to implement a model with a number of classes with varying keypoints.

Thanks!

Suggest a potential alternative/fix

No response

Jul 17 '24 04:07 perry-smartsizer

mmpose mmpose copied to clipboard

[Docs] Unclear Documentation on Multi-Class Output

📚 The doc issue

Suggest a potential alternative/fix

mmpose
mmpose copied to clipboard