CASA-Dialogue-Act-Classifier
CASA-Dialogue-Act-Classifier copied to clipboard
Training fails when the last batch has only one sample.
If the reminder of the size of the training/validation/test over the batch size is 1. In my usecase, the validation set has 18753 elements, so using a batch size of 16 leaves only one element in the lsat batch, and the following error occurs:
Epoch 0: 100%|█████████▉| 12208/12209 [1:55:15<00:00, 1.77it/s, loss=1.960, v_num=gsq1]Traceback (most recent call last):
File "main.py", line 53, in <module> [04:24<00:00, 4.44it/s]
trainer.fit(model)
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 444, in fit
results = self.accelerator_backend.train()
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 63, in train
results = self.train_or_test()
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test
results = self.trainer.train()
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 493, in train
self.train_loop.run_training_epoch()
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 589, in run_training_epoch
self.trainer.run_evaluation(test_mode=False)
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 578, in run_evaluation
output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 171, in evaluation_step
output = self.trainer.accelerator_backend.validation_step(args) [52/1614]
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 87, in validation_step
output = self.__validation_step(args)
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 95, in __validation_step
output = self.trainer.model.validation_step(*args)
File "/root/CASA-Dialogue-Act-Classifier/Trainer.py", line 82, in validation_step
loss = F.cross_entropy(logits, targets)
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/functional.py", line 2468, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/functional.py", line 2260, in nll_loss
if input.size(0) != target.size(0):
IndexError: dimension specified as 0 but tensor has no dimensions
A workaround is to remove one element from my dataset. A more complete solution is to ask the dataloader to ignore the last batch if it's not complete (as explained here). It shouldn't hurt too much in large datasets, but may not be convenient with small ones.
Proposing a solution in PR https://github.com/macabdul9/CASA-Dialogue-Act-Classifier/pull/16