pose-hg-train icon indicating copy to clipboard operation
pose-hg-train copied to clipboard

How can I train with my own images?

Open bemoregt opened this issue 7 years ago • 5 comments

Hi, @anewell

How can I train with my own images?

I don't know how to edit the train.h5 file.

What's the meaning of h5 file's many table 7 it's column ...

Thanks in advance ~

bemoregt avatar Mar 31 '17 03:03 bemoregt

The main thing you want to do is create a file equivalent to util/dataset/mpii.lua as a reference for loading your own data. This provides a uniform interface between the rest of the code and your particular dataset. You don’t have to use an hdf5 file for this. To switch between datasets you can change the default option in opts.lua or call the code with the argument ‘-dataset [your-dataset]’.

  1. Initialization starts with some basic reference variables:
  • nJoints: The number of joints you will be predicting. When initializing new models, the default number of output channels is defined by nJoints.
  • accIdxs: The code will track accuracy over the course of training, but some joints improve very quickly and saturate to high levels of performance (the head and neck for example). It is better to only track the average accuracy of a subset of the more difficult joints defined by this variable.
  • flipRef: When doing data augmentation the image may be flipped. In which case, whether a joint is associated with the left or right side of the body is flipped as well. This tells us which output channels to swap so as not to mix up left and right. (eg for MPII 1 is the right ankle and 6 is the left)
  • skeletonRef: There is some code in img.lua to do visualization of predictions, and this informs which joints to draw and how they are connected. The first two values indicate the joint indices and the third value indicates a color choice.
  1. Important note: Lua is 1-indexed which if you are used to MATLAB should be fine, but it is easy to get mixed up in the code because of that.
  2. self.annot is used internally, and not accessed by other functions outside of mpii.lua.
  3. Most of the helper functions should be pretty self-explanatory, but let’s take a look at “getPartInfo”. You need to define the keypoint locations and two values: c (center) and s (scale). The keypoints are provided in a 2D tensor (size: nJoints x 2) with the (x,y) location of each joint. If the annotation of a joint is not present, the values (0,0) or (1,1) can be provided telling the code to ignore that joint; it will be ignored during supervision and accuracy evaluation. The values c and s are important as they guide the coordinate transformation performed when generating a sample. We do these transformations to make sure everything lines up when we crop to an individual and when generating the lower resolution ground-truth heatmap. All of the image transformations are handled in the task file, so the job of the dataset file is just to tell the task how it would like the image cropped for a particular sample. The reason cropping to individuals is done at all is because MPII has multiple people worth annotating in a given image.
  4. Lastly, the value returned by normalize is used during accuracy evaluation. Typically the evaluation metric used is PCK (Percentage of Correct Keypoints). This calculates the percentage of predictions that fall within some distance of the ground truth. Since figures generally appear at different sizes in images a term is introduced to normalize the distance so it is consistent across different people. In this code the normalize function returns an appropriate value to normalize the distance for a sample. It is worth mentioning that during training, an approximation of the accuracy is calculated that uses the distance in the heatmaps and this ignores the normalize value since it assumes the input to the network has already been scale normalized.

anewell avatar Apr 04 '17 04:04 anewell

So, if we wanted to perform some sort of transfer learning using your existing pre-trained model (umich-stacked-hourglass.t7) and our own dataset that only has 12 joints annotated, would it be possible to train? Could we simply load your pre-trained model, load our dataset, change the nJoints parameters specified above to 12 and then train?

Thanks in advance.

neherh avatar Sep 09 '17 16:09 neherh

Hi @neherh, loading the weights directly isn't possible. When you change nJoints parameter, last conv layers' number of kernels become nJoints and it won't accept the weights with different size. To solve this, you should initialize the other layers individually from pretrained model. But AFAIK this is not possible in Torch. I think the best solution is to train the network from scratch. I trained it on MSCOCO keypoints dataset, the results are satisfying.

See src/model/hg.lua line 57

-- Predicted heatmaps
-- ref.nOutChannels parameter will change if nJoints set to a different number
local tmpOut = nnlib.SpatialConvolution(opt.nFeats,ref.nOutChannels,1,1,1,1,0,0)(ll)
table.insert(out,tmpOut)

mkocabas avatar Sep 10 '17 08:09 mkocabas

@anewell , thanks for your detailed introduction, I have two more questions.

  1. during training, if a joint is invisible, but have a coordinate, do you ignore this during training ?
  2. in evaluate, we dont know the position of a person , do we need to run a person detector first ?

argman avatar Mar 13 '18 13:03 argman

@anewell Thanks for the detailed explanation. Could you please elaborate point 5. Especially, I am confused on how to get normalized distance ("Since figures...." from point 5). Are there any resources where I can find detailed procedure to find the PCK measure?

jovian005 avatar Jul 19 '18 21:07 jovian005