PoseEstimationForMobile icon indicating copy to clipboard operation
PoseEstimationForMobile copied to clipboard

Pretrained CPM model output resolution is only 24x24

Open sfktrifork opened this issue 6 years ago • 2 comments

(Debugged using the iOS demo app at https://github.com/tucan9389/PoseEstimation-CoreML)

The output dimension of the pretrained CPM model when converted to CoreML is 96x96; however, the resolution of its prediction is ostensibly only 1/4th of that. All coordinates of predicted keypoint positions are multiples of 4, e.g. (4, 56), (24, 28), etc. This effectively means a prediction whose accuracy is 4 times worse than expected.

In debugging, I looked at the predicted position of the top point. Notice how the position is always a multiple of 4 for both x and y.

Max top point is: (44,24) with confidence 0.68310546875
w/h: 96/96
Max top point is: (44,24) with confidence 0.6572265625
w/h: 96/96
Max top point is: (44,24) with confidence 0.677734375
w/h: 96/96
Max top point is: (44,24) with confidence 0.72900390625
w/h: 96/96
Max top point is: (48,20) with confidence 0.13720703125
w/h: 96/96
Max top point is: (84,84) with confidence 0.021026611328125

Why does this occur? Does this have to do with the pretrained model itself? Should I train the network myself to yield a higher resolution?

sfktrifork avatar Feb 03 '19 23:02 sfktrifork

I have the same problem here. @edvardHua Do you have any idea? img_1254

tucan9389 avatar Feb 04 '19 04:02 tucan9389

Indeed, the network architectures has a huge margin of improvement. We could follow the tips of paper Convolutional Neural Networks at Constrained Time Cost to optimize it.... But it takes time.

edvardHua avatar Feb 07 '19 01:02 edvardHua