facenet-pytorch
facenet-pytorch copied to clipboard
Error while trying to fine-tune pretrained network - train.py
Hi,
I've encountered following error while trying to train network on my custom dataset.
UnboundLocalError Traceback (most recent call last)
<ipython-input-25-efef0e88967e> in <module>()
14 training.pass_epoch(
15 resnet, loss_fn, train_loader, optimizer, scheduler,
---> 16 batch_metrics=metrics, show_running=True, device=device,
17 )
18
/usr/local/lib/python3.6/dist-packages/facenet_pytorch/models/utils/training.py in pass_epoch(model, loss_fn, loader, optimizer, scheduler, batch_metrics, show_running, device, writer)
126 scheduler.step()
127
--> 128 loss = loss / (i_batch + 1)
129 metrics = {k: v / (i_batch + 1) for k, v in metrics.items()}
130
UnboundLocalError: local variable 'i_batch' referenced before assignment
Following lines in pass_epoch
function are using i_batch
outside the for loop that it has been created in. Please fix.
Problematic lines:
loss = loss / (i_batch + 1)
metrics = {k: v / (i_batch + 1) for k, v in metrics.items()}
Function definition:
def pass_epoch(
model, loss_fn, loader, optimizer=None, scheduler=None,
batch_metrics={'time': BatchTimer()}, show_running=True,
device='cpu', writer=None
):
"""Train or evaluate over a data epoch.
Arguments:
model {torch.nn.Module} -- Pytorch model.
loss_fn {callable} -- A function to compute (scalar) loss.
loader {torch.utils.data.DataLoader} -- A pytorch data loader.
Keyword Arguments:
optimizer {torch.optim.Optimizer} -- A pytorch optimizer.
scheduler {torch.optim.lr_scheduler._LRScheduler} -- LR scheduler (default: {None})
batch_metrics {dict} -- Dictionary of metric functions to call on each batch. The default
is a simple timer. A progressive average of these metrics, along with the average
loss, is printed every batch. (default: {{'time': iter_timer()}})
show_running {bool} -- Whether or not to print losses and metrics for the current batch
or rolling averages. (default: {False})
device {str or torch.device} -- Device for pytorch to use. (default: {'cpu'})
writer {torch.utils.tensorboard.SummaryWriter} -- Tensorboard SummaryWriter. (default: {None})
Returns:
tuple(torch.Tensor, dict) -- A tuple of the average loss and a dictionary of average
metric values across the epoch.
"""
mode = 'Train' if model.training else 'Valid'
logger = Logger(mode, length=len(loader), calculate_mean=show_running)
loss = 0
metrics = {}
for i_batch, (x, y) in enumerate(loader):
x = x.to(device)
y = y.to(device)
y_pred = model(x)
loss_batch = loss_fn(y_pred, y)
if model.training:
loss_batch.backward()
optimizer.step()
optimizer.zero_grad()
metrics_batch = {}
for metric_name, metric_fn in batch_metrics.items():
metrics_batch[metric_name] = metric_fn(y_pred, y).detach().cpu()
metrics[metric_name] = metrics.get(metric_name, 0) + metrics_batch[metric_name]
if writer is not None and model.training:
if writer.iteration % writer.interval == 0:
writer.add_scalars('loss', {mode: loss_batch.detach().cpu()}, writer.iteration)
for metric_name, metric_batch in metrics_batch.items():
writer.add_scalars(metric_name, {mode: metric_batch}, writer.iteration)
writer.iteration += 1
loss_batch = loss_batch.detach().cpu()
loss += loss_batch
if show_running:
logger(loss, metrics, i_batch)
else:
logger(loss_batch, metrics_batch, i_batch)
if model.training and scheduler is not None:
scheduler.step()
loss = loss / (i_batch + 1)
metrics = {k: v / (i_batch + 1) for k, v in metrics.items()}
if writer is not None and not model.training:
writer.add_scalars('loss', {mode: loss.detach()}, writer.iteration)
for metric_name, metric in metrics.items():
writer.add_scalars(metric_name, {mode: metric})
return loss, metrics
Hi @cepa995, did you figured that out? I was trying to train for custom model but I don't understand the given example. Thanks!
If anybody has any issues with this, I noticed when I used loss_fn
and metrics
are variable names in uppercase I got the error. It went away as soon as I used lowercase variable names for those two. I do not think anything else was different. I tried on Colab and locally and saw the same results both times.
Edit: it might actually be related to fixed_image_standardization
.