PoseEstimationForMobile
PoseEstimationForMobile copied to clipboard
Pretrained CPM model output resolution is only 24x24
(Debugged using the iOS demo app at https://github.com/tucan9389/PoseEstimation-CoreML)
The output dimension of the pretrained CPM model when converted to CoreML is 96x96; however, the resolution of its prediction is ostensibly only 1/4th of that. All coordinates of predicted keypoint positions are multiples of 4, e.g. (4, 56), (24, 28), etc. This effectively means a prediction whose accuracy is 4 times worse than expected.
In debugging, I looked at the predicted position of the top
point. Notice how the position is always a multiple of 4 for both x and y.
Max top point is: (44,24) with confidence 0.68310546875
w/h: 96/96
Max top point is: (44,24) with confidence 0.6572265625
w/h: 96/96
Max top point is: (44,24) with confidence 0.677734375
w/h: 96/96
Max top point is: (44,24) with confidence 0.72900390625
w/h: 96/96
Max top point is: (48,20) with confidence 0.13720703125
w/h: 96/96
Max top point is: (84,84) with confidence 0.021026611328125
Why does this occur? Does this have to do with the pretrained model itself? Should I train the network myself to yield a higher resolution?
I have the same problem here.
@edvardHua Do you have any idea?
Indeed, the network architectures has a huge margin of improvement. We could follow the tips of paper Convolutional Neural Networks at Constrained Time Cost to optimize it.... But it takes time.