movenet
movenet copied to clipboard
kpt_coordinate and kpt_offset_yx question
Hi, Thank you for your great work on this repo!
I succeeded to run inference on my example images, but have few questions regarding retraining. I want to retrain MoveNet on a custom dataset (with 26 joints) and as a first step trying to overfit on a few samples. The loss is decreasing, but for some reason I could not overfit yet and correctly display the results.
Could you please kindly explain this line:
https://github.com/lee-man/movenet/blob/91864750dccbfc43fa0460f6bb80c6a2ccb86d36/src/lib/models/networks/movenet.py#L161
Do I get it right that the coordinate is a value on the 64x64 output grid and the offset should be +-2, which we will add to the final coordinate (grid index * 4)? If this is correct, I am not sure I understand how this line reflects it.
Thanks a lot!
If the size of original image is 256*256, kpt_coordinate
will be a value on the 64x64 output grid (output stride 4) and kpt_offset_yx
will be a offset value (may be in range of [0, 1], not 100% certain about this point). And 1/size
will be `1/64). So the output of this line will be the coordinate in range of [0, 1].
By the way, I also tried to fine-tune the MoveNet on the custom dataset and could not get a great result. If you get some good results, I look forward to having further exchanges with you. Thanks!
If the size of original image is 256*256,
kpt_coordinate
will be a value on the 64x64 output grid (output stride 4) andkpt_offset_yx
will be a offset value (may be in range of [0, 1], not 100% certain about this point). And1/size
will be `1/64). So the output of this line will be the coordinate in range of [0, 1].
In my understanding kpt_offset_yx should be an offset in range: -stride/2:stride/2 (-2:+2) in our case, that comes to compensate for the stride.
If so, it looks that offset and grid values are not on the same scale and should not be added as is, but with a coefficient, something like:
kpt_coordinate= (kpt_offset_yx/stride + kpt_coordinate) * (1/size)
Does that make sense ?
By the way, I also tried to fine-tune the MoveNet on the custom dataset and could not get a great result. If you get some good results, I look forward to having further exchanges with you. Thanks!
Sure, I will be happy to collaborate and will share if I have some advancement.
In a meantime, I disabled augmentations, and all losses, except the focal loss for hm_hp heatmaps and trying to overfit on just a few images.
For some reason, the peak values of the output grid are never approaching 1, and the convergence stops at some point. Did you see something similar in your experiments ?
Sorry for the late reply. We did find some problems in the label construction for keypoint offset. I will update the progress once we fix this problem.