facenet-pytorch icon indicating copy to clipboard operation
facenet-pytorch copied to clipboard

Error while trying to fine-tune pretrained network - train.py

Open cepa995 opened this issue 4 years ago • 2 comments

Hi,

I've encountered following error while trying to train network on my custom dataset.

UnboundLocalError                         Traceback (most recent call last)
<ipython-input-25-efef0e88967e> in <module>()
     14     training.pass_epoch(
     15         resnet, loss_fn, train_loader, optimizer, scheduler,
---> 16         batch_metrics=metrics, show_running=True, device=device,
     17     )
     18 

/usr/local/lib/python3.6/dist-packages/facenet_pytorch/models/utils/training.py in pass_epoch(model, loss_fn, loader, optimizer, scheduler, batch_metrics, show_running, device, writer)
    126         scheduler.step()
    127 
--> 128     loss = loss / (i_batch + 1)
    129     metrics = {k: v / (i_batch + 1) for k, v in metrics.items()}
    130 

UnboundLocalError: local variable 'i_batch' referenced before assignment

Following lines in pass_epoch function are using i_batch outside the for loop that it has been created in. Please fix.

Problematic lines:

loss = loss / (i_batch + 1)
metrics = {k: v / (i_batch + 1) for k, v in metrics.items()}

Function definition:

def pass_epoch(
    model, loss_fn, loader, optimizer=None, scheduler=None,
    batch_metrics={'time': BatchTimer()}, show_running=True,
    device='cpu', writer=None
):
    """Train or evaluate over a data epoch.
    
    Arguments:
        model {torch.nn.Module} -- Pytorch model.
        loss_fn {callable} -- A function to compute (scalar) loss.
        loader {torch.utils.data.DataLoader} -- A pytorch data loader.
    
    Keyword Arguments:
        optimizer {torch.optim.Optimizer} -- A pytorch optimizer.
        scheduler {torch.optim.lr_scheduler._LRScheduler} -- LR scheduler (default: {None})
        batch_metrics {dict} -- Dictionary of metric functions to call on each batch. The default
            is a simple timer. A progressive average of these metrics, along with the average
            loss, is printed every batch. (default: {{'time': iter_timer()}})
        show_running {bool} -- Whether or not to print losses and metrics for the current batch
            or rolling averages. (default: {False})
        device {str or torch.device} -- Device for pytorch to use. (default: {'cpu'})
        writer {torch.utils.tensorboard.SummaryWriter} -- Tensorboard SummaryWriter. (default: {None})
    
    Returns:
        tuple(torch.Tensor, dict) -- A tuple of the average loss and a dictionary of average
            metric values across the epoch.
    """
    
    mode = 'Train' if model.training else 'Valid'
    logger = Logger(mode, length=len(loader), calculate_mean=show_running)
    loss = 0
    metrics = {}

    for i_batch, (x, y) in enumerate(loader):
        x = x.to(device)
        y = y.to(device)
        y_pred = model(x)
        loss_batch = loss_fn(y_pred, y)

        if model.training:
            loss_batch.backward()
            optimizer.step()
            optimizer.zero_grad()

        metrics_batch = {}
        for metric_name, metric_fn in batch_metrics.items():
            metrics_batch[metric_name] = metric_fn(y_pred, y).detach().cpu()
            metrics[metric_name] = metrics.get(metric_name, 0) + metrics_batch[metric_name]
            
        if writer is not None and model.training:
            if writer.iteration % writer.interval == 0:
                writer.add_scalars('loss', {mode: loss_batch.detach().cpu()}, writer.iteration)
                for metric_name, metric_batch in metrics_batch.items():
                    writer.add_scalars(metric_name, {mode: metric_batch}, writer.iteration)
            writer.iteration += 1
        
        loss_batch = loss_batch.detach().cpu()
        loss += loss_batch
        if show_running:
            logger(loss, metrics, i_batch)
        else:
            logger(loss_batch, metrics_batch, i_batch)
    
    if model.training and scheduler is not None:
        scheduler.step()

    loss = loss / (i_batch + 1)
    metrics = {k: v / (i_batch + 1) for k, v in metrics.items()}
            
    if writer is not None and not model.training:
        writer.add_scalars('loss', {mode: loss.detach()}, writer.iteration)
        for metric_name, metric in metrics.items():
            writer.add_scalars(metric_name, {mode: metric})

    return loss, metrics

cepa995 avatar Oct 29 '20 10:10 cepa995

Hi @cepa995, did you figured that out? I was trying to train for custom model but I don't understand the given example. Thanks!

MaarufB avatar Dec 20 '21 09:12 MaarufB

If anybody has any issues with this, I noticed when I used loss_fn and metrics are variable names in uppercase I got the error. It went away as soon as I used lowercase variable names for those two. I do not think anything else was different. I tried on Colab and locally and saw the same results both times.

Edit: it might actually be related to fixed_image_standardization.

becausejustyn avatar Feb 06 '23 03:02 becausejustyn