EfficientNet-PyTorch
EfficientNet-PyTorch copied to clipboard
Transfer Learning not working
I'm trying to use pretrained model b-1 to train the model on Places365 but the training is blocking at ~25% (accuracy). I used Imagenet auto-augment policy founded here using this code: Dataloaders :
def _get_train_data_loader(batch_size, training_dir, is_distributed, **kwargs):
logger.info(str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S ")) + "Get train data loader")
base_dir = '/dev/shm/places365_standard/'
defaults.device = torch.device('cuda')
dataset = datasets.ImageFolder(base_dir+"train", transform=transforms.Compose(
[transforms.Resize(224, interpolation=PIL.Image.BICUBIC),
ImageNetPolicy(),
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))]))
train_sampler = torch.utils.data.distributed.DistributedSampler(dataset)
return torch.utils.data.DataLoader(dataset, batch_size=batch_size, pin_memory=True, num_workers=8, sampler=train_sampler)
def _get_test_data_loader(test_batch_size, training_dir, **kwargs):
logger.info(str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S ")) + "Get test data loader")
base_dir = '/dev/shm/places365_standard/'
defaults.device = torch.device('cuda')
dataset = datasets.ImageFolder(base_dir+"val", transform=transforms.Compose(
[transforms.Resize(224, interpolation=PIL.Image.BICUBIC),
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
]))
return torch.utils.data.DataLoader(dataset, batch_size=test_batch_size, num_workers=8, shuffle=True, pin_memory=True)
Training code :
model = EfficientNet.from_pretrained('efficientnet-b1', num_classes=365).to(device)
for n, p in model.named_parameters():
if '_fc' not in n:
p.requires_grad = False
model = torch.nn.parallel.DistributedDataParallel(model)
optimizer = optim.RMSprop(model.parameters(), lr=3e-2, alpha=0.99,
eps=1e-08, weight_decay=1e-5, momentum=0.9)
lmbda = lambda epoch: 0.98739
scheduler = optim.lr_scheduler.MultiplicativeLR(optimizer, lr_lambda=lmbda)
criterion = nn.CrossEntropyLoss()
best_loss = 10000000
for epoch in range(1, args.epochs + 1):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.cuda(non_blocking=True), target.cuda(non_blocking=True)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
if is_distributed and not use_cuda:
# average gradients manually for multi-machine cpu case only
_average_gradients(model)
optimizer.step()
if batch_idx % (len(train_loader)-1) == 0 and batch_idx != 0:
log = 'Train Epoch: {} [{}/{} ({:.0f}%)] Loss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.sampler),
100. * batch_idx / len(train_loader), loss.item())
logger.info(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S ") + log)
test_loss = test(model, test_loader, device)
scheduler.step()
if test_loss < best_loss:
logger.info(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S ") + "Best loss : Saving")
save_model(model, args.model_dir)
best_loss = test_loss
test function:
def test(model, test_loader, device):
model.eval()
test_loss = 0
correct = 0
crit = nn.CrossEntropyLoss(size_average=False)
with torch.no_grad():
for data, target in test_loader:
data, target = data.cuda(non_blocking=True), target.cuda(non_blocking=True)
output = model(data)
test_loss += crit(output, target).item() # sum up batch loss
pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
logger.info(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S ") + 'Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
return test_loss
I don't know what I'm doing wrong ? any help ?
maybe you freeze '_fc' layers
I'm not suer about this issue. In general, EfficientNets are very hard to train. For future reference, make sure you can:
- Do transfer learning by freezing all but the last layer (another way to do this is to construct a simple linear model on top of the
.extract_featuresfunction - Overfit on a small percent of the training data
- Train a different model (e.g. a ResNet) successfully on your full dataset
Then return to trying to train EfficientNet on your full dataset.
maybe you freeze '_fc' layers
No, I froze all but the '_fc' layer
Same. I used a different dataset, and also see accuracy being around the 25% range.
I'm also training with EfficientNet-B2 using the Places365_Standard dataset. I'm training _swish from the last block (_block.22) and freezing the rest of it. I'm currently at about 40% Acc1 in the validation data, any good advice on this issue?
@lukemelas's advice are very helpful try it out.
@gost-sniper @lukemelas In b2, the freeze layer was implemented with blocks 20, 21, 22 and the FC layer. As a result, the percentage of correct answers increased to nearly 55% in the Placese365 standard data. Accuracy was improved by the decay of lr at the 30, 60, and 90 timings.
@aporo4000 can you show the code used for the training phase?
@gost-sniper @lukemelas In b2, the freeze layer was implemented with blocks 20, 21, 22 and the FC layer. As a result, the percentage of correct answers increased to nearly 55% in the Placese365 standard data. Accuracy was improved by the decay of lr at the 30, 60, and 90 timings.
Why freeze FC layer would work? It seems that it doesn't make sense.
@crissallan I made a mistake in writing In b2, blocks 20, 21, 22, and all but the FC layer were implemented as a freeze layer.
@gost-sniper Less efficiently, we have created a fixed layer with
model = EfficientNet.from_pretrained(args.arch, advprop=args.advprop, num_classes=365)
for param in model.parameters():
param.requires_grad = False
for name, module in model.named_modules():
if name == '_blocks.20' or \
name == '_blocks.21' or \
name == '_blocks.22' or \
name == '_fc':
for param in module.parameters():
param.requires_grad = True
Hi @gost-sniper did you fixed the problem?
Could you please share with me your training code? ([email protected])
I am facing problems with a code I made here.
Thank you!