movenet.pytorch
movenet.pytorch copied to clipboard
weights for the movenet_mobilenetv3.py
Hi, I am not able to match the pre-trained weights for the above model file from the output path. Could you guide me on how i can use the v3 model with its pre-trained weights. Thanks
v3 is just for test and it's acc is lower than v2 in my test, also origin movenet backbone is v2, so if u wanna use this I suggest v2. If u still wanna try v3, u can change https://github.com/fire717/movenet.pytorch/blob/bbc81408bd4da49789d912fd08635355fe123e60/lib/init.py#L7-L9
from lib.models.movenet_mobilenetv2 import MoveNet
to
from lib.models.movenet_mobilenetv3 import MoveNet
Thanks for getting back. I have a few questions regarding the inputs and outputs. I want to try to test with my own dataset
- So the inputs are just : (1,3, 192, 192) or something more
- outputs have a list of 4 values. I have used the index 0 for plotting heatmaps similar to the below. Is this the right way to do it? What is the remainder then of the outputs used for? They seem like heatmaps itself
- Can I use an MSE loss function to fine-tune it in comparison with the joint bone loss? Are your targets heatmaps in this problem setting?
- What is the difference between the two weights in the output?
- How many epochs did you train this model for and what was the final accuracy on some test data?
- Have by chance tried to quantize the model via Pytorch to improve inference speed? I see no difference. Thought I can use some reference here. Thanks
def get_keypoints(heatmaps, thr=0.5): n, h, w = heatmaps.size() flat = heatmaps.view(n, -1) max_val, max_idx = flat.max(dim=1) xx = (max_idx % w).view(-1, 1) # (-1,1) for column vector yy = (max_idx // w).view(-1, 1) # (-1,1) for column vector xx[max_val <= thr] = -1 yy[max_val <= thr] = -1 keypoints = torch.cat((xx, yy), dim=1) keypoints = keypoints.numpy() #re-scale them back x = keypoints[:,0](640/48) y = keypoints[:,1](640/48) return x, y
- As for this repo, inputs are just : (1,3, 192, 192), its the MoveNet Lightning version . If u wanna use other size, u need to change the weight matrix, model head,etc.
- Refer to Official blog "MoveNet Architecture" part.
- Of course u can try any loss, but its not a comparison with bone loss, "this is a muti-task learning" in readme.
- The difference is that they are different model.
- Just as the code default setting.
- In my test, post traing quantize is not helpful, QAT may help, but I cannot find a easy way to use QAT in PyTorch(Google use Tensorflow.)
Most of ur question can be found in readme or source code, just dive it if u ara interested in MoveNet!
Hey fire717, Thank you very much for the detailed reply. I stumbled across a few more doubts and just wanted to clarify
How can I fine-tune it for say 16 key points instead of 17? When I change the header to match 17 key points then it looks like the model trains from scratch and then overfits. A code example would be nice. Thanks
What are the other key points? In my case, I don't have any just individual in all images? How do I substitute this in your code for the data loader?
In the bone loss why the bone IDX is only selected for particular numbers? Say you have say 16 key points but here its less than that. Is there a relationship between these two?
def boneLoss(pred, target):
def _Frobenius(mat1, mat2):
return torch.pow(torch.sum(torch.pow(mat1-mat2,2)),0.5)
_bone_idx = [[0,1],[1,2],[2,3],[3,4],[4,5],[5,6],[2,4]]
loss = 0
for bone_id in _bone_idx:
bone_pre = pred[:,bone_id[0],:,:]-pred[:,bone_id[1],:,:]
bone_gt = target[:,bone_id[0],:,:]-target[:,bone_id[1],:,:]
f = _Frobenius(bone_pre,bone_gt)
loss+=f
loss = loss/len(_bone_idx)/pred.shape[0]
return loss
fine-tune it for 16 key points is a little complecated , cause the target of this repo is to reproduce origin movenet, if u wanna use different size or points numbers, u may read the article and the source code and understand them then change the code for urself. kps_mask is use to filter unseen points to avoid computing loss bone IDX is not important, u can try some different set to test
Hey, sorry that I have a lot of questions. Thanks for the detailed repo. I tried to introduce a dummy column with zero values to match the key points from 16 ->17 and set it as not labelled.
- I have some issues with inputs: My format looks like this for one item. I am not sure how to set the other_keypoints and other_centers. How do you get them because my images have only one object per frame and should I set them to -1?
- How do you calculate the centre. I am just taking ((192//2)/192, (192//2)/192) - I have cropped my images such that the objects are at the centre of the frame
- I noticed in the TensorDataloader you have written:
#print(keypoints)
#[0.640625 0.7760417 2, ] (21,)
Why are your key points so small (are you dividing them by 192? this works for me else I get zeros) and should they not be between (0-256/192)? What does the 21 represent - from the COCO dataset? These comments you left in the code snippets are very helpful in navigating through the code especially when you are passing in some other data for testing. Thanks
If u have only one object, other_keypoints should be {} Yes, key points value is relative value of x, such as 0.64, absolute value = 0.64x192
As u have so many questions, I think u maybe didnot know how the MoveNet work, maybe u should read the bolg firstly, then read the source code of this repo and understand it by urself.
Thanks for sharing the article. I read through it and was helpful.
However, when i pass in my data for preparation using your code label2center. My centres look like below with negative values but are positioned well. I presume they should range from 0-to 1. The max and min values are large I get back are very large or negative. I am scaling them down like the key points. My heatmaps look okay though within range and on point. Any hints on what I could be doing wrong do to correct it. Thanks
center = [(192//2)/192, (192//2)/192]
heatmaps,sigma = label2heatmap(keypoints, other_keypoints, self.img_size)
cx = min(max(0,int(center[0]*self.img_size//4)),self.img_size//4-1)
cy = min(max(0,int(center[1]*self.img_size//4)),self.img_size//4-1)
#Denter heatmaps
centers = label2center(cx=cx, cy=cy, other_centers=other_centers, img_size=192, sigma=sigma)#(1, 48, 48)
regs = label2reg(keypoints, cx, cy, self.img_size) #(14, 48, 48)
offsets = label2offset(keypoints, cx, cy, regs, self.img_size)#(14, 48, 48)
labels = np.concatenate([heatmaps,centers,regs,offsets],axis=0)
Also, my loss values are quite large but the model seems to be learning from the movenet loss:
label2center just convert ur center point location (x,y) to a heatmap feature map, so if it's not right, maybe because the (x,y) for your center is not right. The center point is calculate in https://github.com/fire717/movenet.pytorch/blob/master/scripts/make_coco_data_17keypooints.py
which meaning is the center point of all keypoints of one object.