adversarial-robustness-toolbox
adversarial-robustness-toolbox copied to clipboard
batch_norm error in fit() at end of training epoch
Describe the bug
At the end of this for loop, depending on batch_size
and the first dim of x_preprocessed
it's possible that i_batch
will end up having a batch_size of 1. This occurs when x_preprocessed.shape[0] % batch_size == 1
. In my case, x_preprocessed
is of shape (39209, 48, 48, 3)
and batch_size
is 8, and 39209 % 8
is 1. When i_batch
has batch_size 1, this then causes an error in torch's batch normalization. Specifically in this function which is called here. The end result is an error that looks like this:
File "/opt/conda/lib/python3.8/site-packages/art/estimators/classification/pytorch.py", line 1115, in forward
x = module_(x)
│ └ tensor([[ 7.4983, 33.5128, 3.2305, 0.0000, 60.0542, 0.0000, 0.0000,
│ 0.0000, 0.0000, 11.4205, 0.000...
└ BatchNorm1d(300, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
│ │ └ {}
│ └ (tensor([[ 7.4983, 33.5128, 3.2305, 0.0000, 60.0542, 0.0000, 0.0000,
│ 0.0000, 0.0000, 11.4205, 0.00...
└ <bound method _BatchNorm.forward of BatchNorm1d(300, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)>
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 168, in forward
return F.batch_norm(
│ └ <function batch_norm at 0x7f1e6b6bc550>
└ <module 'torch.nn.functional' from '/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py'>
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2280, in batch_norm
_verify_batch_size(input.size())
│ │ └ <method 'size' of 'torch._C._TensorBase' objects>
│ └ tensor([[ 7.4983, 33.5128, 3.2305, 0.0000, 60.0542, 0.0000, 0.0000,
│ 0.0000, 0.0000, 11.4205, 0.000...
└ <function _verify_batch_size at 0x7f1e6b6bc4c0>
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2248, in _verify_batch_size
raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
└ torch.Size([1, 300])
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 300])
This doesn't appear to be strictly an ART bug but rather an error occurring between the interaction of ART and PyTorch
To Reproduce
Call a PyTorch estimator's fit()
method using a dataset and batch_size such that # of elements in dataset % batch_size == 1
. Below is a snippet of code that does so:
import numpy as np
from armory.baseline_models.pytorch import micronnet_gtsrb
from armory.scenarios.utils import to_categorical
model = micronnet_gtsrb.get_art_model({}, {})
num_batches = 5
x = np.random.randn(num_batches, 48, 48, 3).astype(np.float32)
y = to_categorical(np.random.randint(10, size=num_batches)).astype(np.float32)
model.fit(x, y, batch_size=4, nb_epochs=1
Notice that if you change num_batches
to 6, e.g., the error goes away
Using Armory run: set dataset "batch_size"
in this config to 4 or 8 and run armory run <config>
.
System information (please complete the following information):
- OS: ubuntu
- Python version: 3.8.10
- ART version: 1.10.1
- PyTorch version: 1.10.2
Hi @lcadalzo Thank you very much for reporting this issue!
One way to deal with this would be to add a drop_last
kwarg to fit
similar to what pytorch dataloaders do. Here's how it is defined in https://pytorch.org/docs/stable/_modules/torch/utils/data/dataloader.html:
drop_last (bool, optional): set to ``True`` to drop the last incomplete batch,
if the dataset size is not divisible by the batch size. If ``False`` and
the size of dataset is not divisible by the batch size, then the last batch
will be smaller. (default: ``False``)