pytorch_Realtime_Multi-Person_Pose_Estimation icon indicating copy to clipboard operation
pytorch_Realtime_Multi-Person_Pose_Estimation copied to clipboard

Training on my own dataset

Open pharrellyhy opened this issue 5 years ago • 17 comments

Hi,

Thanks for sharing this great repo. One thing I'd like to know is how to train my own dataset. Can you provide more code so we can train from scratch? Thanks!

pharrellyhy avatar Sep 17 '18 15:09 pharrellyhy

It needs some refactor for the data loader I think, currently it needs your understanding for how the data extraction works. I think I will work on it when I have time. :)

tensorboy avatar Sep 19 '18 02:09 tensorboy

Hi @tensorboy ,

Thanks for the reply. Actually, what I want to do is to regress all the hand keypoints and my dataset contains only egocentric hands. So in this case can I follow the same procedure of the human pose detection? Another question is what's the shape of the network output? Since we don't know the number of object in the image, how to determine the size of the output? Can you also give me some hints on this? Thank you very much!

pharrellyhy avatar Sep 19 '18 04:09 pharrellyhy

I’m in home now, I can reply you tmr by detailed description.

On Tue, Sep 18, 2018 at 9:00 PM Pharrell Yang [email protected] wrote:

Hi @tensorboy https://github.com/tensorboy ,

Thanks for the reply. Actually, what I want to do is to regress all the hand keypoints and my dataset contains only egocentric hands. So in this case can I follow the same procedure of the human pose detection? Another question is what's the shape of the network output? Since we don't know the number of object in the image, how to determine the size of the output? Can you also give me some hints on this? Thank you very much!

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation/issues/63#issuecomment-422643089, or mute the thread https://github.com/notifications/unsubscribe-auth/AQZIv6yg0vHAFs5eUv_9Rx1BmUPiIGrMks5uccFvgaJpZM4WsNqH .

tensorboy avatar Sep 19 '18 04:09 tensorboy

Hi, @tensorboy

Thank you. I'm looking forward to your reply.

pharrellyhy avatar Sep 19 '18 04:09 pharrellyhy

Hi, @pharrellyhy. The output of the network is 19 heatmap and 38 paf, you can search the 19 and 38 at here: https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation/blob/master/network/rtpose_vgg.py and change it correspondingly.

The input of the network is (batch_size, 3, 368, 368), and the output size is (batch_size, 19, 46, 46) for heatmap, (batch_size, 38, 46, 46) for paf. The network is fully convolutional and the stride is 8, it means if your input size if (batch_size, 3, 256, 256) then the output size should be (*, *, 32, 32).

The coco_data_pipeline.py has the most information needs to modify like how to read images and labels: https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation/blob/master/training/datasets/coco_data/COCO_data_pipeline.py

tensorboy avatar Sep 19 '18 18:09 tensorboy

Hi, @tensorboy,

It's really helpful. I will take a look at those files and let you know if I have any other questions. Thanks!

pharrellyhy avatar Sep 20 '18 01:09 pharrellyhy

You are welcome, let me know if you have any further questions. :)

On Wed, Sep 19, 2018 at 6:47 PM Pharrell Yang [email protected] wrote:

Hi, @tensorboy https://github.com/tensorboy,

It's really helpful. I will take a look at those files and let you know if I have any other questions. Thanks!

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation/issues/63#issuecomment-423012757, or mute the thread https://github.com/notifications/unsubscribe-auth/AQZIv93jt2ZZhPMyAQJVd55weIImVrp_ks5ucvPFgaJpZM4WsNqH .

tensorboy avatar Sep 20 '18 02:09 tensorboy

Hi, @tensorboy

Few more questions.

  1. Where does mask2014 folder come from?
  2. I saw there are different preprocessing methods for different base networks. The original base network in the paper is the first ten layers of VGG19. I assume 'rtpose' is the 'real time pose', so if I follow the procedure in original paper, I should use 'rtpose', right? If that's true, what's the preprocessing for 'vgg'?

pharrellyhy avatar Sep 20 '18 02:09 pharrellyhy

Hi, @tensorboy

How to calculate scale_provided for each image in the meta?

pharrellyhy avatar Sep 20 '18 09:09 pharrellyhy

Hi, @pharrellyhy .

  1. The mask2014 comes from the original repo; it is the mask for the unlabelled person, and you can set it to all zero if all the person in your image is labeled. That weight didn't have a significant impact on the final result as I can remember.

  2. The original 'rtpose (VGG19 backbone)' used 'rtpose preprocessing (subtracting 128 and dividing 128)', but the VGG19 are trained to use 'vgg' preprocessing, which is a mistake by the original repo I believe.

tensorboy avatar Sep 20 '18 17:09 tensorboy

Hi, @pharrellyhy You can check the 'scale_provided' in here:

https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation/blob/d1b034051841313ec46551873ec22c7ff0a0d767/training/genJSON.m#L116

tensorboy avatar Sep 20 '18 17:09 tensorboy

Hi, @tensorboy

Thanks for those information. Really appreciate it.

Unfortunately, I don't have matlab installed in my environment, so I don't know exactly what bbox(4) is in joint_all(count).scale_provided = RELEASE(i).annorect(p).bbox(4)/368;. Can you provide more hints? Thanks!

pharrellyhy avatar Sep 21 '18 02:09 pharrellyhy

bbox(4) may be the height or width of the image.

tensorboy avatar Sep 21 '18 16:09 tensorboy

Hi, @tensorboy

I just trained the network last week (got distracted from other things for a month). I didn't use paf for now and only want to regress the fingertips of the hand. So in this setting, I consider all the fingertips of different fingers as the same type, so the heatmap has type (*, *, *, 2) including background. After finished training, the loss is converged (~1.2 for each stage. I did not use mask_miss and multiply 44 * 22 * 2 to the loss. The output size is 44x22. ) but it doesn't seem like the network has learnt something useful. The outputs always look the same, which range from -0.3 to 0.03. It looks like the network just tries to minimize the loss. Do you have any idea on this situation? Thanks!

pharrellyhy avatar Oct 22 '18 09:10 pharrellyhy

@pharrellyhy

Hi,

Were you able to successfully train on custom dataset? If yes, could you let me know the process

meetdave06 avatar Dec 27 '18 09:12 meetdave06

Hi, @pharrellyhy. The output of the network is 19 heatmap and 38 paf, you can search the 19 and 38 at here: https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation/blob/master/network/rtpose_vgg.py and change it correspondingly.

The input of the network is (batch_size, 3, 368, 368), and the output size is (batch_size, 19, 46, 46) for heatmap, (batch_size, 38, 46, 46) for paf. The network is fully convolutional and the stride is 8, it means if your input size if (batch_size, 3, 256, 256) then the output size should be (*, *, 32, 32).

The coco_data_pipeline.py has the most information needs to modify like how to read images and labels: https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation/blob/master/training/datasets/coco_data/COCO_data_pipeline.py

Hi, @tensorboy I found the output you said 19 heatmap and 38 paf, but when I try train model on a new dataset which has 20 keypoints and 15 keypoint-connections,problem occur: RuntimeError: The size of tensor a (38) must match the size of tensor b (30) at non-singleton dimension 1. I found It may means my heatmap‘s(and vec or paf) shape not match to output of network. Then I try to change output channels of network, another problem found: RuntimeError: Given groups=1, weight of size [128, 185, 7, 7], expected input[8, 179, 46, 46] to have 185 channels, but got 179 channels instead. I think I change data_dir correctly, but the channels between pred and label still not match. Now I dont konw how to change it to train on a new dataset correctly.Could you give some ideas?

thisimyusername avatar Oct 22 '21 20:10 thisimyusername

I change the number of keypoints and keypoint-connections same as coco.It can train.But when I test,RuntimeError: Error(s) in loading state_dict for rtpose_model: Missing key(s) in state_dict: "model0.0.weight", "model0.0.bias", "model0.2.weight", "model0.2.bias", "model0.5.weight", "model0.5.bias", "model0.7.weight", "model0.7.bias"

thisimyusername avatar Oct 24 '21 02:10 thisimyusername