EfficientNet-PyTorch Gradient overflow with apex using EfficientNet-b2, b3, and b4

Gradient overflow with apex using EfficientNet-b2, b3, and b4

Open cuevhv opened this issue 5 years ago • 0 comments

Hi, I was trying to train efficientNet b2, b3, and b4 using apex mixed precision. I use efficientNet as backbone for a segmentation task using CityScapes. Unfortunately, after a few iterations, the scale_loss ends up being zero because of gradient overflow. This does not happen with the pre-defined backbones from pyTorch (import torchvision.models as models, I tried with Resnet and DenseNet). The snipet of my code is the following:

optimizer = optim.Adam(model.parameters())
model, optimizer = apex.amp.initialize(model, optimizer)

...

loss = Cross_entropy(y_pred, y_gt)
with apex.amp.scale_loss(loss, optimizer) as scaled_loss:
                    scaled_loss.backward()
optimizer.step()
optimizer.zero_grad()

System: OS: Ubuntu 16.04 and 18.04 Pytorch: tried with 1.4, 1.5 and .1.6 apex: 0.1 EfficientNet: installed from source

Did anyone experience the same? Thanks

Sep 03 '20 20:09 cuevhv

EfficientNet-PyTorch EfficientNet-PyTorch copied to clipboard

Gradient overflow with apex using EfficientNet-b2, b3, and b4

EfficientNet-PyTorch
EfficientNet-PyTorch copied to clipboard