FaPN-full
FaPN-full copied to clipboard
dcn_v2 error RuntimeError: expected scalar type Float but found Half
When running the network, I encountered this problem. Through debugging, I found that the offset in DCN's forword function is a type of float16.So I think this might be the cause of the problem,Do you have a better idea for this problem.
class DCN(DCNv2): def init(self, in_channels, out_channels, kernel_size, stride, padding, dilation=1, deformable_groups=1, extra_offset_mask=False,): super(DCN, self).init(in_channels, out_channels, kernel_size, stride, padding, dilation, deformable_groups)
self.extra_offset_mask = extra_offset_mask
channels_ = self.deformable_groups * 3 * self.kernel_size[0] * self.kernel_size[1]
self.conv_offset_mask = nn.Conv2d(self.in_channels, channels_, kernel_size=self.kernel_size, stride=self.stride, padding=self.padding, bias=True)
self.init_offset()
def init_offset(self):
self.conv_offset_mask.weight.data.zero_()
self.conv_offset_mask.bias.data.zero_()
def forward(self, input, main_path=None):
if self.extra_offset_mask:
out = self.conv_offset_mask(input[1])
input = input[0]
else:
out = self.conv_offset_mask(input)
o1, o2, mask = torch.chunk(out, 3, dim=1) # each has self.deformable_groups * self.kernel_size[0] * self.kernel_size[1] channels
offset = torch.cat((o1, o2), dim=1) # x, y [0-8]: the first group,
@KingWangJL Many thanks for your interest in our work. We also find this problem when we train our models with Apex Mixed Precision. However, we still have not found any good solution to this problem now. For now, we just train the model with full precision.
Thanks your reply,I directly modified the dCN_v2 source code,At present, the network model training is normal,But I don't think it's a good way,it's only run is OK!
At 2022-02-25 14:08:07, "Shihua Huang" @.***> wrote:
@KingWangJL Many thanks for your interest in our work. We also find this problem when we train our models with Apex Mixed Precision. However, we still have not found any good solution to this problem now. For now, we just train the model with full precision.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>
I have successfully trained the model using apex.amp and got comparable results. You can add @amp.float_function on top of the forward and backward function of modules in DCNv2. Maybe you can refer to https://github.com/CharlesShang/DCNv2/pull/50
@LeoniusChen Cool! Thanks for your sharing!
@LeoniusChen By the way, could you please share the final results when apex is used?
I only apply apex.amp to test the Cityscapes Semantic Segmentation (PointRend + FaPN R50) task. Here is the result.
Noted. Thanks again for your interest in our work. By the way, compared to the results in our paper, it is not good.