adversarial-robustness-toolbox icon indicating copy to clipboard operation
adversarial-robustness-toolbox copied to clipboard

batch_norm error in fit() at end of training epoch

Open lcadalzo opened this issue 2 years ago • 1 comments

Describe the bug At the end of this for loop, depending on batch_size and the first dim of x_preprocessed it's possible that i_batch will end up having a batch_size of 1. This occurs when x_preprocessed.shape[0] % batch_size == 1. In my case, x_preprocessed is of shape (39209, 48, 48, 3) and batch_size is 8, and 39209 % 8 is 1. When i_batch has batch_size 1, this then causes an error in torch's batch normalization. Specifically in this function which is called here. The end result is an error that looks like this:

File "/opt/conda/lib/python3.8/site-packages/art/estimators/classification/pytorch.py", line 1115, in forward
    x = module_(x)
        │       └ tensor([[  7.4983,  33.5128,   3.2305,   0.0000,  60.0542,   0.0000,   0.0000,
        │                    0.0000,   0.0000,  11.4205,   0.000...
        └ BatchNorm1d(300, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
           │             │        └ {}
           │             └ (tensor([[  7.4983,  33.5128,   3.2305,   0.0000,  60.0542,   0.0000,   0.0000,
           │                          0.0000,   0.0000,  11.4205,   0.00...
           └ <bound method _BatchNorm.forward of BatchNorm1d(300, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)>
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 168, in forward
    return F.batch_norm(
           │ └ <function batch_norm at 0x7f1e6b6bc550>
           └ <module 'torch.nn.functional' from '/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py'>
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2280, in batch_norm
    _verify_batch_size(input.size())
    │                  │     └ <method 'size' of 'torch._C._TensorBase' objects>
    │                  └ tensor([[  7.4983,  33.5128,   3.2305,   0.0000,  60.0542,   0.0000,   0.0000,
    │                               0.0000,   0.0000,  11.4205,   0.000...
    └ <function _verify_batch_size at 0x7f1e6b6bc4c0>
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2248, in _verify_batch_size
    raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
                                                                                                      └ torch.Size([1, 300])

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 300])

This doesn't appear to be strictly an ART bug but rather an error occurring between the interaction of ART and PyTorch

To Reproduce Call a PyTorch estimator's fit() method using a dataset and batch_size such that # of elements in dataset % batch_size == 1. Below is a snippet of code that does so:

import numpy as np
from armory.baseline_models.pytorch import micronnet_gtsrb
from armory.scenarios.utils import to_categorical

model = micronnet_gtsrb.get_art_model({}, {})
num_batches = 5
x = np.random.randn(num_batches, 48, 48, 3).astype(np.float32)
y = to_categorical(np.random.randint(10, size=num_batches)).astype(np.float32)
model.fit(x, y, batch_size=4, nb_epochs=1

Notice that if you change num_batches to 6, e.g., the error goes away

Using Armory run: set dataset "batch_size" in this config to 4 or 8 and run armory run <config>.

System information (please complete the following information):

  • OS: ubuntu
  • Python version: 3.8.10
  • ART version: 1.10.1
  • PyTorch version: 1.10.2

lcadalzo avatar May 30 '22 15:05 lcadalzo

Hi @lcadalzo Thank you very much for reporting this issue!

beat-buesser avatar Jun 01 '22 21:06 beat-buesser

One way to deal with this would be to add a drop_last kwarg to fit similar to what pytorch dataloaders do. Here's how it is defined in https://pytorch.org/docs/stable/_modules/torch/utils/data/dataloader.html:

        drop_last (bool, optional): set to ``True`` to drop the last incomplete batch,
            if the dataset size is not divisible by the batch size. If ``False`` and
            the size of dataset is not divisible by the batch size, then the last batch
            will be smaller. (default: ``False``)

davidslater avatar Oct 14 '22 19:10 davidslater