EfficientNet-PyTorch
EfficientNet-PyTorch copied to clipboard
Gradient overflow with apex using EfficientNet-b2, b3, and b4
Hi,
I was trying to train efficientNet b2, b3, and b4 using apex mixed precision. I use efficientNet as backbone for a segmentation task using CityScapes. Unfortunately, after a few iterations, the scale_loss ends up being zero because of gradient overflow. This does not happen with the pre-defined backbones from pyTorch (import torchvision.models as models, I tried with Resnet and DenseNet).
The snipet of my code is the following:
optimizer = optim.Adam(model.parameters())
model, optimizer = apex.amp.initialize(model, optimizer)
...
loss = Cross_entropy(y_pred, y_gt)
with apex.amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
optimizer.step()
optimizer.zero_grad()
System: OS: Ubuntu 16.04 and 18.04 Pytorch: tried with 1.4, 1.5 and .1.6 apex: 0.1 EfficientNet: installed from source
Did anyone experience the same? Thanks