pose-hg-train icon indicating copy to clipboard operation
pose-hg-train copied to clipboard

Training on MSCOCO keypoint dataset

Open mkocabas opened this issue 6 years ago • 2 comments

Hi @anewell ,

First of all thank you for sharing your code.

I'm preparing COCO keypoint annotations and dataset specific interface file to train on COCO. I've done mostly except one issue. MPII dataset provides head bbox for each person which COCO doesn't have. To overcome this I found a work around:

  • If shoulders are visible use distance between shoulders to define head bbox, else use person bbox and ideal human body ratio to define head bbox.

But this approach is erroneous when the image doesn't contain whole body.

Is there any advice to properly define the head size?

Below code snippet belongs to src/misc/convert_annot.py file:

# Find shoulder coordinates
left_shoulder = (ann['keypoints'][0::3][6], ann['keypoints'][1::3][6])
right_shoulder = (ann['keypoints'][0::3][5], ann['keypoints'][1::3][5])
            
# If shoulders not visible then approximate head bbox with person bbox values
if left_shoulder == (0,0) or right_shoulder == (0,0):
    diff = np.array([ann['bbox'][3]/7.5, ann['bbox'][2]/7.5], np.float)
    normalization = np.linalg.norm(diff) * .6

# If shoulders are visible define head bbox according to dist between shoulders
else:
    dist = math.sqrt((right_shoulder[0] - left_shoulder[0])**2 + (right_shoulder[1] - left_shoulder[1])**2)
    diff = np.array([dist/2, dist/1.5], np.float)
    normalization = np.linalg.norm(diff) * .6

annot['normalize'] += [normalization]

mkocabas avatar Sep 07 '17 11:09 mkocabas

Hi @mkocabas

This is a tough issue, especially on the COCO data. The way that normalization is done during the official COCO evaluation is by pixel area of the person's segmentation mask, but that can be a fairly inconsistent indication of person size.

An alternative is to compute all possible limb lengths given a particular set of ground truth keypoints and compare these to an average baseline for each limb. The relative ratio will give an indication of the person's size, and by computing the ratio across all annotated limbs there will be some robustness if some limbs happen to be foreshortened.

No matter what you end up doing, the ground truth "size" will be pretty noisy. The best thing you can do is play around with different ideas and visually inspect to see what leads to the most reliable cropping of input figures. The network should have the capacity to learn some deal of scale invariance, and it is worth adding scale data augmentation during training anyways.

Hope that helps a bit, and let me know if you figure out a more reliable measure of scale.

anewell avatar Sep 11 '17 16:09 anewell

@anewell, @mkocabas

what tool did you use to annotate the keypoints of a human body?

arnitkun avatar Jun 12 '18 11:06 arnitkun