learnable-triangulation-pytorch icon indicating copy to clipboard operation
learnable-triangulation-pytorch copied to clipboard

Issues, notes and documentation while testing on the CMU dataset, using volumetric model

Open Samleo8 opened this issue 4 years ago • 14 comments

Following the instructions at issue #24 and #19, I was able to successfully test on the CMU Panoptic Dataset using the provided pretrained Human36M weights (more specifics here) on the volumetric models, with a snapshot of some of the results below: heatmaps0 keypoints_vis0

Issues However, despite following all 4 pointers in #24, I still have issues with the problems with some of the keypoint detections (especially with the predictions the lower body being completely off). 0019

Is it possible that the pretrained (H36M) model is unable to handle cases where the lower body is truncated, and thus results in the wrong predictions above?

Notes/Documentation To those who would like to recreate the results and evaluate on the CMU dataset, note there are many changes that need to be made. I list the important ones below; check my forked repository for the rest.

  1. You will need to create your own custom CMUPanopticDataset class, similar to the Human36MMultiviewDataset class in mvn/datasets/human36m.py. You will also need the ground truth BBOXes in the link in issue #19, and generate your own labels file. If you are lazy, follow my pre-processing instructions here, but note that there may be missing documentation here and there.
  2. As noted in issue #24, units are a big issue. CMU keypoints are in mm while Human36M are in cm. Note that since the model was trained on the Human36M, the predicted keypoints and the ground truth keypoints need to be "synced" by appropriate scaling factors.
  3. UPDATE: If like me, you used the volumetric model without first running on the algebraic model, you need to specify use_gt_pelvis to be true in the yaml config file.

For those who are interested, I have updated the documentation in my repository at https://github.com/Samleo8/learnable-triangulation-pytorch.

Samleo8 avatar May 18 '20 01:05 Samleo8

It seems like there is something wrong with world coordinates.

The model usually learns that legs are close to the ground if no other info is there. By looking at the pictures it does not seem to be the case, so probably there is bug with coordinates conversion between CMU and Human36.

yurymalkov avatar May 18 '20 01:05 yurymalkov

Hi thanks for the reply!

Uh, what do you mean by coordinates conversion? I believe that the world coordinates were properly converted by when I set the scaling factor and changed the world axes, according to the pointers in #24?

My current hypothesis is that the model is unable to guess joints which are "out of the picture" (leg joints that are missing), and so the the heatmaps for those particular joints are either non-existent, or the model guesses that the person is kneeling or sitting instead.

Samleo8 avatar May 18 '20 02:05 Samleo8

@Samleo8 I would have double-checked that everything is the same. As far as I remember, z-axis has different sign in CMU and Humans3.6M and at some point we had a bug in this part and saw somewhat similar behavior. I can imagine the hypothesis to be the case, but I would expect of the model to give default coordinates of foots (e.g. close to the ground).

yurymalkov avatar May 18 '20 05:05 yurymalkov

@Samleo8 I would have double-checked that everything is the same. As far as I remember, z-axis has different sign in CMU and Humans3.6M and at some point we had a bug in this part and saw somewhat similar behavior.

Hi thanks again for the reply! You are right about the z-axis having a different sign, and indeed I saw this in triangulation.py, and made sure it was triggered, unless of course I am missing something else as well?

I can imagine the hypothesis to be the case, but I would expect of the model to give default coordinates of foots (e.g. close to the ground).

You are actually right: In most cases the model chose the feet to be closer to the floor (see below). The example I gave was a bad one as it was an "anomaly" compared to the rest. 0001 0003 0012

Samleo8 avatar May 18 '20 07:05 Samleo8

To confirm the hypothesis, I will try it out on cameras which are able to capture the full body (i.e. no truncation), I'll let you know how it goes!

Samleo8 avatar May 18 '20 08:05 Samleo8

I have tried it out on camera views which capture the full body. Unfortunately, because of that the perspectives are a bit more "birds-eye view" than the frontal view. The results are below: image image

It is noted that this time, the keypoints are even more off. Could it be because the model is not used to views from such an angle?

Samleo8 avatar May 18 '20 09:05 Samleo8

It is noted that this time, the keypoints are even more off. Could it be because the model is not used to views from such an angle?

Apparently, the model is robust against different angles. It seems that the issue is due to some of the cameras being faulty.

To confirm the hypothesis, I will try it out on cameras which are able to capture the full body (i.e. no truncation), I'll let you know how it goes!

With all cameras capturing full pose, preliminary results seem to suggest that the model works well on the CMU Dataset as well! The hypothesis about the lack of full-body pose seems to be correct.

It would be good to train the model so that it knows what to do with occluded body parts.

Some of the results are shown below: image

image

Samleo8 avatar May 18 '20 09:05 Samleo8

@Samleo8 I am a bit confused. Are you using algebraic or volumetric models?

yurymalkov avatar May 18 '20 17:05 yurymalkov

Oh, sorry I didn't make it clearer; I've since updated the title.

I'm using the volumetric model, but didnt use the algebraic model to first predict the pelvis positions. Because of this, the use_gt_pelvis flag must be set to true for this to work.

Samleo8 avatar May 19 '20 01:05 Samleo8

@Samleo8 I see. I wonder, how do you get the 2D heatmap distributions?

yurymalkov avatar May 19 '20 03:05 yurymalkov

Thanks for pointing this out, I didn't think much of it before!

Correct me if I am wrong, but the 2D heatmaps seem to come from the 2D backbone that is part of the volumetric model? The checkpoints for this backbone (human36m) were given as a pretrained weights ~~, likely from the algebraic model?~~

~~Am I therefore right to say that in order to properly evaluate (and train) the CMU dataset, I need to first run it on the algebraic model to produce a 2D backbone with weights targeted towards the joints that CMU wants? ~~

If you are wondering how I visualized the heatmaps, they were part of the visualize_heatmap code that was already shipped with the repository

Samleo8 avatar May 19 '20 09:05 Samleo8

Hi, @Samleo8! You’re correct about heatmaps. As a backbone for CMU we used a model, pretrained on COCO dataset (from here https://github.com/microsoft/human-pose-estimation.pytorch/blob/master/README.md). You still need to evaluate Algebraic model to get positions for 3D cubes.

karfly avatar May 20 '20 18:05 karfly

Looking at the images above, I think there can be 3 possible problems:

  1. Wrong location of the cube. Maybe human doesn’t fully fit into the cube
  2. Something wrong with coordinate system. It differs a lot from Human3.6M’s, so you’d better carefully double check that.
  3. Something wrong with camera parameters (extrinsics and intrinsics).

karfly avatar May 20 '20 18:05 karfly

Hi @karfly thanks for the reply. Is this to answer the above comment https://github.com/karfly/learnable-triangulation-pytorch/issues/75#issuecomment-630007855 or problem with partially occluded body in #76 ?

  1. This is quite possible, especially considering that I realised the "gt" pelvis may have been referring to the wrong base point index. I'll double check on that.

  2. I've ensured that the parts of your code where you fixed the coordinate system issue are being used in triangulation.py, and also double checked, so this should be fine.

image

  1. There seems to be some issue with this particular camera's intrinsics as you can see from the failed projection. I've since ignored this camera (camera 29).

For now, the model is being trained on the CMU dataset (but possible issue #77) and seems to be doing well if the Tensorboard images are anything to go by; we'll see how that goes!

Samleo8 avatar May 21 '20 03:05 Samleo8