movenet icon indicating copy to clipboard operation
movenet copied to clipboard

kpt_coordinate and kpt_offset_yx question

Open lizagh opened this issue 3 years ago • 5 comments

Hi, Thank you for your great work on this repo!

I succeeded to run inference on my example images, but have few questions regarding retraining. I want to retrain MoveNet on a custom dataset (with 26 joints) and as a first step trying to overfit on a few samples. The loss is decreasing, but for some reason I could not overfit yet and correctly display the results.

Could you please kindly explain this line:

https://github.com/lee-man/movenet/blob/91864750dccbfc43fa0460f6bb80c6a2ccb86d36/src/lib/models/networks/movenet.py#L161

Do I get it right that the coordinate is a value on the 64x64 output grid and the offset should be +-2, which we will add to the final coordinate (grid index * 4)? If this is correct, I am not sure I understand how this line reflects it.

Thanks a lot!

lizagh avatar Dec 26 '21 22:12 lizagh

If the size of original image is 256*256, kpt_coordinate will be a value on the 64x64 output grid (output stride 4) and kpt_offset_yx will be a offset value (may be in range of [0, 1], not 100% certain about this point). And 1/size will be `1/64). So the output of this line will be the coordinate in range of [0, 1].

lee-man avatar Dec 27 '21 03:12 lee-man

By the way, I also tried to fine-tune the MoveNet on the custom dataset and could not get a great result. If you get some good results, I look forward to having further exchanges with you. Thanks!

lee-man avatar Dec 27 '21 03:12 lee-man

If the size of original image is 256*256, kpt_coordinate will be a value on the 64x64 output grid (output stride 4) and kpt_offset_yx will be a offset value (may be in range of [0, 1], not 100% certain about this point). And 1/size will be `1/64). So the output of this line will be the coordinate in range of [0, 1].

In my understanding kpt_offset_yx should be an offset in range: -stride/2:stride/2 (-2:+2) in our case, that comes to compensate for the stride. If so, it looks that offset and grid values are not on the same scale and should not be added as is, but with a coefficient, something like: kpt_coordinate= (kpt_offset_yx/stride + kpt_coordinate) * (1/size) Does that make sense ?

lizagh avatar Dec 28 '21 20:12 lizagh

By the way, I also tried to fine-tune the MoveNet on the custom dataset and could not get a great result. If you get some good results, I look forward to having further exchanges with you. Thanks!

Sure, I will be happy to collaborate and will share if I have some advancement.
In a meantime, I disabled augmentations, and all losses, except the focal loss for hm_hp heatmaps and trying to overfit on just a few images. For some reason, the peak values of the output grid are never approaching 1, and the convergence stops at some point. Did you see something similar in your experiments ?

lizagh avatar Dec 28 '21 20:12 lizagh

Sorry for the late reply. We did find some problems in the label construction for keypoint offset. I will update the progress once we fix this problem.

lee-man avatar Jan 06 '22 06:01 lee-man