Realtime_Multi-Person_Pose_Estimation icon indicating copy to clipboard operation
Realtime_Multi-Person_Pose_Estimation copied to clipboard

About generating vector field in CPM_data_transformer.cpp

Open RuWang15 opened this issue 7 years ago • 12 comments

I read the code and I am confused about whether the vector field is generated merely by the position of joints, because I didn't find anything about putting vector in a region? https://github.com/CMU-Perceptual-Computing-Lab/caffe_train/blob/76dd9563fb24cb1702d0245cda7cc36ec2aed43b/src/caffe/cpm_data_transformer.cpp#L1137

RuWang15 avatar Dec 06 '17 07:12 RuWang15

For each layer of PAF there is 2 layers with X coordinate of vector and Y coordinate of vector. Note difference in (np+ 1+ 2*i) and (np+ 2+ 2*i)

anatolix avatar Dec 09 '17 19:12 anatolix

@anatolix Thanks, but I'm still confused about whether the vector field has something to do with the area of limbs?And where can I find in the code?

RuWang15 avatar Dec 13 '17 05:12 RuWang15

I haven't fully understand the question. Code you've liked do whole path generation.

PAF looks aproximately like this: 0000003paf (1 vector for each 8x8 image patch) picture is generated with https://github.com/anatolix/keras_Realtime_Multi-Person_Pose_Estimation/blob/master/py_rmpe_server/rmpe_server_tester.py

anatolix avatar Dec 14 '17 13:12 anatolix

Thank you very much for the excellent example! I will explan my question with your picture. paf Take the vector field on the arm for example. My question is that "how do you know the area of the forearm so that you can put vector on it other than the region besides the arm?"

RuWang15 avatar Dec 18 '17 07:12 RuWang15

It doesn't know the area. (It could have used segmentation for that, but it's not used actually). PAF match to hand size is lucky accident in this picture, there is no exact match for other PAFs.

Currently It just calculates segment A->B and draws PAF for every 8x8 square for all squares which center is close than 8 pixels to A->B. Look for putVecMaps fucntion for details. Line if(dist <= thre){ controls the PAF placement. Code above calculates distance to segment.

anatolix avatar Dec 19 '17 18:12 anatolix

Thank you very much! One last question (actually more than one), are all the training data masked by mask_all or mask_miss before going through the network and the masked part is black? And if a picture contains 5 annotated people, the picture appears 5 times in the data, right?

RuWang15 avatar Jan 01 '18 08:01 RuWang15

One last question (actually more than one), are all the training data masked by mask_all or mask_miss before going through the network and the masked part is black?

mask_all never used for anything except visualization. mask_miss is not actually a picture in os array of float from 0.0 ... 1.0 loss is just multiplied for this mask. If you multiply mask_miss on 255 and convert to integer you will get something looking like picture where masked parts are black and non masked are white. original picture is not modified except for vgg preprocessing.

And if a picture contains 5 annotated people, the picture appears 5 times in the data, right?

short answer is yes. long answer some of them filtered to not to feed pictures which are too close to each other

anatolix avatar Jan 02 '18 16:01 anatolix

You mean, for the vgg layers at the beginning of the network, they use the masked pictures, and after these layers they use original pictures? I'm really confused about the ‘mask’ part 😂

RuWang15 avatar Jan 04 '18 05:01 RuWang15

Mask never touch pictures.

  1. Mask has exactly same dimensions as ground truth and network output. ie 46 x 46 x num_layers.

Mask applied to:

  1. ground truth heatmap and pafs (multiplied by mask)
  2. network output (multiplied by mask)

If in same point of answer mask is zero this means "ignore answers in this point while training network" because loss will be zero in this point.

Some pictures about masks here https://github.com/michalfaber/keras_Realtime_Multi-Person_Pose_Estimation/issues/8#issuecomment-342977756

anatolix avatar Jan 04 '18 19:01 anatolix

@ @anatolix Thanks for your answers. But , I would like to how the label's format(or the CPM and PAF branch's ground truth), I mean, did you show the label or ground truth of the CPM and PAF branch and the output of every stage. I'm confused about the labels' format and the output of the every stage. Thanks

Ai-is-light avatar Jan 19 '18 03:01 Ai-is-light

@ @anatolix @ @anatolix Thanks for your answers. But , I would like to how the label's format(or the CPM and PAF branch's ground truth), I mean, did you show the label or ground truth of the CPM and PAF branch and the output of every stage. I'm confused about the labels' format and the output of the every stage. Thanks

Ai-is-light avatar Jan 19 '18 03:01 Ai-is-light

I am not sure I completely understand the question but about loss and stages:

Actually each stage has same output format, exactly same Ground Truth, and loss is calculated on each stage. In ideal world last stage will be enough, but in real world network is very deep and gradients of network will be completely lost to last layer. To push them thru network, we "tweak" them in middle layers to in right direction. This tweak is called 'intermediate layer supervision' and if you want know more about it you should read previous work "convolution pose machines" https://arxiv.org/pdf/1602.00134.pdf

anatolix avatar Jan 20 '18 00:01 anatolix