human-pose-estimation.pytorch icon indicating copy to clipboard operation
human-pose-estimation.pytorch copied to clipboard

question about flip test

Open Liz66666 opened this issue 6 years ago • 6 comments

Hi, thanks for your code! But I have a question about flip test. I don't know why here you do a shift on output_flipped. The comments says that "feature is not aligned, shift flipped heatmap for higher accuracy", what's the meaning of this opreration? https://github.com/Microsoft/human-pose-estimation.pytorch/blob/8ed745798439f247c85c57392428320d4c553654/lib/core/function.py#L121-L125

Liz66666 avatar Dec 19 '18 10:12 Liz66666

@YoungZiyu ''To stabilize the predictions, we evaluate both the original image and its flipped version, and average their output heatmaps.'' (this sentence is from 《Self Adversarial Training for Human Pose Estimation》) It is common that both the original image and its flipped version are used for for higher accuracy, because the predictions is not stable enough, to be specific, the prediction of same piont may not at the same position for twice. The operation of average their output heatmaps is also leveraged in CPN , Self Adversarial Training for Human Pose Estimation and Deep High-Resolution Representation Learning for Human Pose Estimation. you can visit this page for more details.

738654805 avatar Mar 10 '19 08:03 738654805

But why is the one pixel shift on the flipped output performed?

FrancescoPiemontese avatar Apr 16 '19 11:04 FrancescoPiemontese

Under 1-D situation, Assuming the input image size is 16, we annotate some 'important' pixels with digits from 1 to 8, other unimportant pixels are -, then the input image will look like [1, -, -, 2, 3, -, -, 4, 5, -, -, 6, 7, -, -, 8] then, due to padding mechanism in strided conv layers, on the stride 4 feature map (that size is 16 // 4 = 4), the pixel centers projected onto the input image will be [1, 3, 5, 7] on the flipped feature map, [8, 6, 4, 2], after flipping back, without shift, [2, 4, 6, 8], with shift, we get [2, 2, 4, 6]

Note the distance in the input image, [2,3], [4,5] and [6,7] are neighboring pixels, while [1,2], [3,4], [5,6] and [7,8] are not, they have a distance of stride - 1

gen-ko avatar Feb 04 '20 23:02 gen-ko

hi @gen-ko how to get the real shift for very complex model?

zimenglan-sysu-512 avatar Nov 09 '20 06:11 zimenglan-sysu-512

@zimenglan-sysu-512 try this "The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation"

DHCZ avatar Nov 09 '20 07:11 DHCZ

hi @DHCZ, thanks

zimenglan-sysu-512 avatar Nov 09 '20 10:11 zimenglan-sysu-512