pytorch_Realtime_Multi-Person_Pose_Estimation
pytorch_Realtime_Multi-Person_Pose_Estimation copied to clipboard
Training on my own dataset
Hi,
Thanks for sharing this great repo. One thing I'd like to know is how to train my own dataset. Can you provide more code so we can train from scratch? Thanks!
It needs some refactor for the data loader I think, currently it needs your understanding for how the data extraction works. I think I will work on it when I have time. :)
Hi @tensorboy ,
Thanks for the reply. Actually, what I want to do is to regress all the hand keypoints and my dataset contains only egocentric hands. So in this case can I follow the same procedure of the human pose detection? Another question is what's the shape of the network output? Since we don't know the number of object in the image, how to determine the size of the output? Can you also give me some hints on this? Thank you very much!
I’m in home now, I can reply you tmr by detailed description.
On Tue, Sep 18, 2018 at 9:00 PM Pharrell Yang [email protected] wrote:
Hi @tensorboy https://github.com/tensorboy ,
Thanks for the reply. Actually, what I want to do is to regress all the hand keypoints and my dataset contains only egocentric hands. So in this case can I follow the same procedure of the human pose detection? Another question is what's the shape of the network output? Since we don't know the number of object in the image, how to determine the size of the output? Can you also give me some hints on this? Thank you very much!
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation/issues/63#issuecomment-422643089, or mute the thread https://github.com/notifications/unsubscribe-auth/AQZIv6yg0vHAFs5eUv_9Rx1BmUPiIGrMks5uccFvgaJpZM4WsNqH .
Hi, @tensorboy
Thank you. I'm looking forward to your reply.
Hi, @pharrellyhy. The output of the network is 19 heatmap and 38 paf, you can search the 19 and 38 at here: https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation/blob/master/network/rtpose_vgg.py and change it correspondingly.
The input of the network is (batch_size, 3, 368, 368), and the output size is (batch_size, 19, 46, 46) for heatmap, (batch_size, 38, 46, 46) for paf. The network is fully convolutional and the stride is 8, it means if your input size if (batch_size, 3, 256, 256) then the output size should be (*, *, 32, 32).
The coco_data_pipeline.py has the most information needs to modify like how to read images and labels: https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation/blob/master/training/datasets/coco_data/COCO_data_pipeline.py
Hi, @tensorboy,
It's really helpful. I will take a look at those files and let you know if I have any other questions. Thanks!
You are welcome, let me know if you have any further questions. :)
On Wed, Sep 19, 2018 at 6:47 PM Pharrell Yang [email protected] wrote:
Hi, @tensorboy https://github.com/tensorboy,
It's really helpful. I will take a look at those files and let you know if I have any other questions. Thanks!
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation/issues/63#issuecomment-423012757, or mute the thread https://github.com/notifications/unsubscribe-auth/AQZIv93jt2ZZhPMyAQJVd55weIImVrp_ks5ucvPFgaJpZM4WsNqH .
Hi, @tensorboy
Few more questions.
- Where does mask2014 folder come from?
- I saw there are different preprocessing methods for different base networks. The original base network in the paper is the first ten layers of VGG19. I assume 'rtpose' is the 'real time pose', so if I follow the procedure in original paper, I should use 'rtpose', right? If that's true, what's the preprocessing for 'vgg'?
Hi, @tensorboy
How to calculate scale_provided for each image in the meta?
Hi, @pharrellyhy .
-
The mask2014 comes from the original repo; it is the mask for the unlabelled person, and you can set it to all zero if all the person in your image is labeled. That weight didn't have a significant impact on the final result as I can remember.
-
The original 'rtpose (VGG19 backbone)' used 'rtpose preprocessing (subtracting 128 and dividing 128)', but the VGG19 are trained to use 'vgg' preprocessing, which is a mistake by the original repo I believe.
Hi, @pharrellyhy You can check the 'scale_provided' in here:
https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation/blob/d1b034051841313ec46551873ec22c7ff0a0d767/training/genJSON.m#L116
Hi, @tensorboy
Thanks for those information. Really appreciate it.
Unfortunately, I don't have matlab installed in my environment, so I don't know exactly what bbox(4) is in joint_all(count).scale_provided = RELEASE(i).annorect(p).bbox(4)/368;. Can you provide more hints? Thanks!
bbox(4) may be the height or width of the image.
Hi, @tensorboy
I just trained the network last week (got distracted from other things for a month). I didn't use paf for now and only want to regress the fingertips of the hand. So in this setting, I consider all the fingertips of different fingers as the same type, so the heatmap has type (*, *, *, 2) including background. After finished training, the loss is converged (~1.2 for each stage. I did not use mask_miss and multiply 44 * 22 * 2 to the loss. The output size is 44x22. ) but it doesn't seem like the network has learnt something useful. The outputs always look the same, which range from -0.3 to 0.03. It looks like the network just tries to minimize the loss. Do you have any idea on this situation? Thanks!
@pharrellyhy
Hi,
Were you able to successfully train on custom dataset? If yes, could you let me know the process
Hi, @pharrellyhy. The output of the network is 19 heatmap and 38 paf, you can search the 19 and 38 at here: https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation/blob/master/network/rtpose_vgg.py and change it correspondingly.
The input of the network is (batch_size, 3, 368, 368), and the output size is (batch_size, 19, 46, 46) for heatmap, (batch_size, 38, 46, 46) for paf. The network is fully convolutional and the stride is 8, it means if your input size if (batch_size, 3, 256, 256) then the output size should be (*, *, 32, 32).
The
coco_data_pipeline.pyhas the most information needs to modify like how to read images and labels: https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation/blob/master/training/datasets/coco_data/COCO_data_pipeline.py
Hi, @tensorboy I found the output you said 19 heatmap and 38 paf, but when I try train model on a new dataset which has 20 keypoints and 15 keypoint-connections,problem occur: RuntimeError: The size of tensor a (38) must match the size of tensor b (30) at non-singleton dimension 1. I found It may means my heatmap‘s(and vec or paf) shape not match to output of network. Then I try to change output channels of network, another problem found: RuntimeError: Given groups=1, weight of size [128, 185, 7, 7], expected input[8, 179, 46, 46] to have 185 channels, but got 179 channels instead. I think I change data_dir correctly, but the channels between pred and label still not match. Now I dont konw how to change it to train on a new dataset correctly.Could you give some ideas?
I change the number of keypoints and keypoint-connections same as coco.It can train.But when I test,RuntimeError: Error(s) in loading state_dict for rtpose_model: Missing key(s) in state_dict: "model0.0.weight", "model0.0.bias", "model0.2.weight", "model0.2.bias", "model0.5.weight", "model0.5.bias", "model0.7.weight", "model0.7.bias"