EfficientNet-PyTorch icon indicating copy to clipboard operation
EfficientNet-PyTorch copied to clipboard

Models trained in v0.6.3 give wrong output in v0.7.0

Open jamt9000 opened this issue 4 years ago • 3 comments

We have been using EfficientNet 0.6.3 to train models for multi-label classification. When upgrading to 0.7.0 and loading the previously trained weights, the model output is incorrect (basically predicting negative for everything).

Possibly related to #233

jamt9000 avatar Jan 09 '21 16:01 jamt9000

I got same problem with my model, after comparing 0.6.3 and git version, the difference comes from

https://github.com/lukemelas/EfficientNet-PyTorch/blob/3d400a58023086b5c128ecd4b3ea46c129b5988b/efficientnet_pytorch/model.py#L199

and

https://github.com/lukemelas/EfficientNet-PyTorch/blob/3d400a58023086b5c128ecd4b3ea46c129b5988b/efficientnet_pytorch/model.py#L204

in version 0.6.3 uses global_params.image_size only, and all Conv2d are the same.

Wondering why make this change, does the current version match official TF version?

PS: I have exported EfficientDet weights from TF to pytorch, got really close results as TF version with 0.6.3,

xuyuan avatar Jan 12 '21 21:01 xuyuan

I see. Here's the relevant diff https://github.com/lukemelas/EfficientNet-PyTorch/compare/396b06b..a78e84e#diff-e10ae68994b282d9b78c1a169f9f1c9d2cab1a2ec4a4da8a1ab23ba8914304b4

diff --git a/efficientnet_pytorch/model.py b/efficientnet_pytorch/model.py
--- a/efficientnet_pytorch/model.py
+++ b/efficientnet_pytorch/model.py
@@ -143,15 +143,18 @@ class EfficientNet(nn.Module):
             )
 
             # The first block needs to take care of stride and filter size increase.
-            self._blocks.append(MBConvBlock(block_args, self._global_params))
-            if block_args.num_repeat > 1:
+            self._blocks.append(MBConvBlock(block_args, self._global_params, image_size=image_size))
+            image_size = calculate_output_image_size(image_size, block_args.stride)
+            if block_args.num_repeat > 1: # modify block_args to keep same output size
                 block_args = block_args._replace(input_filters=block_args.output_filters, stride=1)
             for _ in range(block_args.num_repeat - 1):
-                self._blocks.append(MBConvBlock(block_args, self._global_params))
+                self._blocks.append(MBConvBlock(block_args, self._global_params, image_size=image_size))
+                # image_size = calculate_output_image_size(image_size, block_args.stride)  # stride = 1
 
         # Head
         in_channels = block_args.output_filters  # output of final block
         out_channels = round_filters(1280, self._global_params)
+        Conv2d = get_same_padding_conv2d(image_size=image_size)
         self._conv_head = Conv2d(in_channels, out_channels, kernel_size=1, bias=False)
         self._bn1 = nn.BatchNorm2d(num_features=out_channels, momentum=bn_mom, eps=bn_eps)

jamt9000 avatar Jan 12 '21 22:01 jamt9000

I also got different outputs under py2 and py3. Finally I figure out the difference is caused by the following lines: https://github.com/lukemelas/EfficientNet-PyTorch/blob/7e8b0d312162f335785fb5dcfa1df29a75a1783a/efficientnet_pytorch/utils.py#L189 https://github.com/lukemelas/EfficientNet-PyTorch/blob/7e8b0d312162f335785fb5dcfa1df29a75a1783a/efficientnet_pytorch/utils.py#L190 and https://github.com/lukemelas/EfficientNet-PyTorch/blob/7e8b0d312162f335785fb5dcfa1df29a75a1783a/efficientnet_pytorch/utils.py#L240

The division operation give different outputs under python2.x and python3.x which will influence the padding in depthwise convolution layer. This may be the reason also for this issue. Anyone has this problem could have a try.

tabsun avatar May 06 '21 13:05 tabsun