Realtime_Multi-Person_Pose_Estimation.PyTorch icon indicating copy to clipboard operation
Realtime_Multi-Person_Pose_Estimation.PyTorch copied to clipboard

train fpn/train_pose.py

Open wulixunhua opened this issue 6 years ago • 7 comments

Traceback (most recent call last): File "train_pose.py", line 313, in train_val(model, args) File "train_pose.py", line 180, in train_val vec1, heat1, vec2, heat2 = model(input_var, mask_var) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, **kwargs) File "../../model/fpn.py", line 231, in forward [p2_out, p3_out, p4_out, p5_out, p6_out] = self.fpn(x) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, **kwargs) File "../../model/fpn.py", line 47, in forward p4_out = self.P4_conv1(c4_out) + F.upsample(p5_out, scale_factor=2) RuntimeError: The size of tensor a (23) must match the size of tensor b (24) at non-singleton dimension 3


96->46->23->12 ,then upsample 23 and 12*2=24 no match , how to do?

wulixunhua avatar May 08 '18 01:05 wulixunhua

adjust your input resolution to 384 to make sure it could be divide by 32

Xiangyu-CAS avatar May 08 '18 11:05 Xiangyu-CAS

Traceback (most recent call last): File "train_pose.py", line 313, in train_val(model, args) File "train_pose.py", line 180, in train_val vec1, heat1, vec2, heat2 = model(input_var, mask_var) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, **kwargs) File "../../model/fpn.py", line 245, in forward vec2 = self.predict_L1_stage2(out2) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, **kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/conv.py", line 277, in forward self.padding, self.dilation, self.groups) File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 90, in conv2d return f(input, weight, bias) RuntimeError: Given groups=1, weight[38, 128, 1, 1], so expected input[2, 588, 48, 48] to have 128 channels, but got 588 channels instead


thanks, but raise a new question: channels no match .

wulixunhua avatar May 09 '18 08:05 wulixunhua

the previous question I have solved. I think your code has some bug. I modify class Pose_Estimation of fpn.py : PS: modification part I mark "####################"


class Pose_Estimation(nn.Module):

def __init__(self, vec_num, heat_num):
    super(Pose_Estimation, self).__init__()
    resnet = ResNet('resnet50', stage5=True)
    C1, C2, C3, C4, C5 =  resnet.stages()
    self.fpn = FPN(C1, C2, C3, C4, C5, out_channels=256)
    self.block1 = nn.Sequential(
        nn.Conv2d(256, 128, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(128, 512, kernel_size=1, stride=1, padding=0),
        nn.ReLU(inplace=True)
    )
    self.predict_L1_stage1 = nn.Conv2d(512, vec_num, kernel_size=1, stride=1, padding=0)
    self.predict_L2_stage1 = nn.Conv2d(512, heat_num, kernel_size=1, stride=1, padding=0)
    
    
    self.block2 = nn.Sequential(       ############## modify: 256 + vec_num + heat_num ->569
        nn.Conv2d(569, 128, kernel_size=7, stride=1, padding=3),  
        nn.ReLU(inplace=True),
        nn.Conv2d(128, 128, kernel_size=7, stride=1, padding=3),
        nn.ReLU(inplace=True),
        nn.Conv2d(128, 128, kernel_size=7, stride=1, padding=3),
        nn.ReLU(inplace=True),
        nn.Conv2d(128, 128, kernel_size=7, stride=1, padding=3),
        nn.ReLU(inplace=True)
    )
                                
    self.predict_L1_stage2 = nn.Conv2d(128, vec_num, kernel_size=1, stride=1, padding=0)
    self.predict_L2_stage2 = nn.Conv2d(128, heat_num, kernel_size=1, stride=1, padding=0)

    self.initialize_weights()



def initialize_weights(self):
    """Initialize model weights.
        """
    print("----------------initialize_weights-----------")
    for m in self.modules():
        if isinstance(m, nn.Conv2d):
            n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
            m.weight.data.normal_(0, math.sqrt(2. / n))
            if m.bias is not None:
                m.bias.data.zero_()
        elif isinstance(m, nn.BatchNorm2d):
            m.weight.data.fill_(1)
            m.bias.data.zero_()
        elif isinstance(m, nn.Linear):
            m.weight.data.normal_(0, 0.01)
            m.bias.data.zero_()

def load_weights(self, filepath):
    """Modified version of the correspoding Keras function with
            the addition of multi-GPU support and the ability to exclude
            some layers from loading.
            exlude: list of layer names to excluce
            """
    if os.path.exists(filepath):
        self.load_state_dict(torch.load(filepath))
    else:
        print("Weight file not found ...")

def forward(self, x, mask):
    print("----------------forward-----------")
    [p2_out, p3_out, p4_out, p5_out, p6_out] = self.fpn(x)

    out1 = self.block1(p3_out)
    #print("out1:",out1)   # 2x512x48x48 
    vec1 = self.predict_L1_stage1(out1)
    heat1 = self.predict_L2_stage1(out1)   ######################  modify : L1->L2
    #print("vec1:",vec1)   # 2x38x48x48 
    #print("heat1:",heat1) # 2x19x48x48       
	
    vec1_mask = vec1 * mask
    heat1_mask = heat1 * mask
    #print("vec1_mask:",vec1_mask)   # 2x38x48x48 
    #print("heat1_mask:",heat1_mask) # 2x19x48x48        
	
    out2 = torch.cat([vec1, heat1, out1], 1)
    #print("out2:",out2)   # 2x569x48x48        
    out2 = self.block2(out2)      ################################# add this 
    #print("--------------------------")	
	
    vec2 = self.predict_L1_stage2(out2)	
    heat2 = self.predict_L2_stage2(out2)   #########################  modify:  stage1->stage2
    #print("vec2:",vec2)       # 2x38x48x48
    #print("heat2:",heat2)     # 2x19x48x48  
	
    vec2_mask = vec2 * mask
    heat2_mask = heat2 * mask
    #print("vec1_mask:",vec1_mask)    # 2x38x48x48
    #print("heat2_mask:",heat2_mask)  # 2x19x48x48
    print("----------------forward done-----------")
    return vec1_mask, heat1_mask, vec2_mask, heat2_mask

Can you help me to look at my modification is true or not ? thank you very much! I run this code , the loss is nan..... Can you guide me ?

wulixunhua avatar May 09 '18 11:05 wulixunhua

Yes, you are right, there is some bugs in this code. When I updating the code, experiments of fpn was not completed. Your modifications seems reasonable, and I think in this code I forget to load pretrain model of Resnet..... That may be the reason leads to NAN

You can refer to this, https://github.com/Xiangyu-CAS/Heatmap/blob/master/models/CPM_FPN.py, this is fpn for single person estimation.

Or, you can just run the baseline. https://github.com/Xiangyu-CAS/Realtime_Multi-Person_Pose_Estimation.PyTorch/blob/master/model/vgg_1branch.py

Xiangyu-CAS avatar May 10 '18 05:05 Xiangyu-CAS

I haven't the resnet.pth pretrain model Can you give me a link of Resnet pretrain model ? (Baidu Cloud or Google) My coding ability is not well.... If you have rest time , Can you revise the cpm-fpn.py of load pretrained weights part ? I am not sure I can finish it , but I will try my best . Thank you very much ! @Xiangyu-CAS

wulixunhua avatar May 11 '18 00:05 wulixunhua

Pretrain-model LINK: https://github.com/aaron-xichen/pytorch-playground

Xiangyu-CAS avatar May 11 '18 11:05 Xiangyu-CAS

"-------------------load resnet50-------------------------------- loading model from /home/xuedingyu1/zhangbenben/RMPPE-pytorch/model/resnet50-19c8e357.pth "

I modify the fpn.py and load the pretrained Resnet50 model successfully , but the loss still is NAN . thats why? Looking forward your reply , thank you !


I run the resnet as basement alone , it can work well. But put the resnet add to fpn, it can't work.

wulixunhua avatar May 12 '18 12:05 wulixunhua