human-pose-estimation.pytorch
human-pose-estimation.pytorch copied to clipboard
question about flip test
Hi, thanks for your code! But I have a question about flip test. I don't know why here you do a shift on output_flipped. The comments says that "feature is not aligned, shift flipped heatmap for higher accuracy", what's the meaning of this opreration? https://github.com/Microsoft/human-pose-estimation.pytorch/blob/8ed745798439f247c85c57392428320d4c553654/lib/core/function.py#L121-L125
@YoungZiyu ''To stabilize the predictions, we evaluate both the original image and its flipped version, and average their output heatmaps.'' (this sentence is from 《Self Adversarial Training for Human Pose Estimation》) It is common that both the original image and its flipped version are used for for higher accuracy, because the predictions is not stable enough, to be specific, the prediction of same piont may not at the same position for twice. The operation of average their output heatmaps is also leveraged in CPN , Self Adversarial Training for Human Pose Estimation and Deep High-Resolution Representation Learning for Human Pose Estimation. you can visit this page for more details.
But why is the one pixel shift on the flipped output performed?
Under 1-D situation,
Assuming the input image size is 16, we annotate some 'important' pixels with digits from 1
to 8
, other unimportant pixels are -
,
then the input image will look like
[1, -, -, 2, 3, -, -, 4, 5, -, -, 6, 7, -, -, 8]
then, due to padding mechanism in strided conv layers,
on the stride 4 feature map (that size is 16 // 4 = 4
), the pixel centers projected onto the input image will be
[1, 3, 5, 7]
on the flipped feature map,
[8, 6, 4, 2]
,
after flipping back, without shift,
[2, 4, 6, 8]
,
with shift, we get
[2, 2, 4, 6]
Note the distance in the input image, [2,3], [4,5] and [6,7] are neighboring pixels, while
[1,2], [3,4], [5,6] and [7,8] are not, they have a distance of stride - 1
hi @gen-ko how to get the real shift for very complex model?
@zimenglan-sysu-512 try this "The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation"
hi @DHCZ, thanks