pytorch-deform-conv-v2
pytorch-deform-conv-v2 copied to clipboard
About the learning rate setting of p_conv and m_conv
You set the gradient of p_conv and m_conv to 0.1 times the other layers, but I find the gradient has no change after backward. I use the following code to test.
def _set_lr(module, grad_input, grad_output):
print('grad input:', grad_input)
print('grad output:', grad_output)
grad_input = (grad_input[i] * 0.1 for i in range(len(grad_input)))
grad_output = (grad_output[i] * 0.1 for i in range(len(grad_output)))
x = torch.randn(4, 3, 5, 5)
y_ = torch.randn(4, 1, 5, 5)
loss = nn.L1Loss()
d_conv = DeformConv2d(inc=3, outc=1, modulation=True)
y = d_conv.forward(x)
l = loss(y, y_)
l.backward()
print('p conv grad:')
print(d_conv.p_conv.weight.grad)
print('m conv grad:')
print(d_conv.m_conv.weight.grad)
print('conv grad:')
print(d_conv.conv.weight.grad)
The gradient of p_conv is same with the grad_input, but I think the gradient of p_conv is 0.1 times the gradient of the grad_input. Am I wrong?
You're right! I'll fix it.
You're right! I'll fix it.
Have you solved this problem now?
@dontLoveBugs Hello, can you review my issue ? I think the bilinear kernel is wrong
You're right! I'll fix it.
'tuple' object can not be modified. Your code just get an generator.
I have searched online, the grad of output can not be modified, if you want modify the grad of input, you need to return the modified grad of input , like : def _set_lr(module, grad_input, grad_output): return (grad_input[i] * 0.1 for i in range(len(grad_input)))
you can try it. My question is : Why change the p_conv gradients, Is it to avoid affecting the learning of another feature extraction branch?
@XinZhangNLPR the you is becuse the backforward_hook expected tuple, not 'generator'
I have searched online, the grad of output can not be modified, if you want modify the grad of input, you need to return the modified grad of input , like : def _set_lr(module, grad_input, grad_output): return (grad_input[i] * 0.1 for i in range(len(grad_input)))
you can try it. My question is : Why change the p_conv gradients, Is it to avoid affecting the learning of another feature extraction branch?
Your suggestion still return a generator not a tuple
You're right! I'll fix it.
it seems this bug has not fixed yet