Pytorch_Realtime_Multi-Person_Pose_Estimation
Pytorch_Realtime_Multi-Person_Pose_Estimation copied to clipboard
not sure about correctness of the result
above is the result I got after running the test_pose.py with the converted pytorch model. there are many blue points and I'm wondering if the algorithm is working correctly. And what do these blue dots mean here?
My python version is 3.5 and I've adapted the original code according to another issue "Sorry, please Python3 version". I'm running on CPU and it takes about 40s to finish the processing. Do you know how long will it take when using cuda?
Thank you in advance.
I don't know why there are so many blue (purple?) dots. The purple dot means right ear. You could try if don't draw the eye and ear. I don't think the model have a good performance on eye and ear.
And the time of test depends on the test size and the number of the candidate dots. In general, when using GPU and not changing the test size, it takes around 2~5s per image.
How can this network work realtime when it takes several seconds to process one single frame?
and can I delete something in the limbSeq and mapIdx to avoid calculating keypoints of eyes and ears? which part of body do elements in the limb sequence stand for?
@KaiWU17TUM , I met the same situation. In some pictures, the model I trained using this code would predict a lot of random blue points. I wonder where the bug is? How do you address this issue?
@KaiWU17TUM @lzj322 I meet the same situation... Do you know how to solve this problem?
I also get a similar result with the original image:
make sure the test image has substracted mean and been divided by std. What's more, pay attention to the padValue.
And please make sure limbSeq could correspond to the mapIdx.
I tried following normalize functions:
def normalize(origin_img):
origin_img = np.array(origin_img, dtype=np.float32)
origin_img -= np.mean(origin_img)
origin_img /= np.std(origin_img, ddof=1)
return origin_img
and
def normalize(origin_img):
origin_img = np.array(origin_img, dtype=np.float32)
origin_img -= 128.0
origin_img /= 256.0
return origin_img
but both give the same result.
I did not change any other variable:
padValue = 0.
limbSeq = [[3,4], [4,5], [6,7], [7,8], [9,10], [10,11], [12,13], [13,14], [1,2], [2,9], [2,12], [2,3], [2,6], \
[3,17],[6,18],[1,16],[1,15],[16,18],[15,17]]
mapIdx = [[19,20],[21,22],[23,24],[25,26],[27,28],[29,30],[31,32],[33,34],[35,36],[37,38],[39,40], \
[41,42],[43,44],[45,46],[47,48],[49,50],[51,52],[53,54],[55,56]]
Should I somehow modify them?
The value of limbSeq and mapIdx depend on your own tasks. The limbSeq means the pair points of paf and the mapIdx corresponds to the paf's index of heatmap.
When I run the example "ski" image, I got exactly the same result as @guist . How should I modify to make the result right? Many Thanks
There are two reasons for this problem:
- the value you padding, 128 or 0
- the order of the limSeq should be consistent with the last-one rather than the original version of openpose. @leonlulu
Thanks for the quick reply @ybai62868 I didn't change anything in 'test_pose.py' before I ran the test shell. So let me see... The padding value I use is 0 and the limSeq is the same as last-one's I guess.
Hi @ybai62868 and @leonlulu Do you have any solutions about this issue?
I just make it works as expected by replacing some lines with OpenPose's code. Revised parts:
# find connection in the specified sequence, center 29 is in the position 15
limbSeq = [[2,3], [2,6], [3,4], [4,5], [6,7], [7,8], [2,9], [9,10], \
[10,11], [2,12], [12,13], [13,14], [2,1], [1,15], [15,17], [1,16], \
[16,18], [3,17], [6,18]]
# the middle joints heatmap correpondence
mapIdx = [[31,32], [39,40], [33,34], [35,36], [41,42], [43,44], [19,20], [21,22], \
[23,24], [25,26], [27,28], [29,30], [47,48], [49,50], [53,54], [51,52], \
[55,56], [37,38], [45,46]]
for part in range(19-1):
@guist @KaiWU17TUM I guess you have taken the official openpose model. Or model converted from caffe. For official openpose, background is at the 18-th channel, nose at the 0-th channel, but in this repo, background is at the 0-th channel. Thus you get many noses (actually background points after nms).