CASA-Dialogue-Act-Classifier icon indicating copy to clipboard operation
CASA-Dialogue-Act-Classifier copied to clipboard

Training fails when the last batch has only one sample.

Open glicerico opened this issue 4 years ago • 2 comments

If the reminder of the size of the training/validation/test over the batch size is 1. In my usecase, the validation set has 18753 elements, so using a batch size of 16 leaves only one element in the lsat batch, and the following error occurs:

Epoch 0: 100%|█████████▉| 12208/12209 [1:55:15<00:00,  1.77it/s, loss=1.960, v_num=gsq1]Traceback (most recent call last):                                                   
  File "main.py", line 53, in <module> [04:24<00:00,  4.44it/s]                                                                                                              
    trainer.fit(model)                                                                                                                                                       
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 444, in fit                                          
    results = self.accelerator_backend.train()                                                                                                                               
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 63, in train                            
    results = self.train_or_test()                                                                                                                                           
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test                        
    results = self.trainer.train()                                                                                                                                           
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 493, in train                                        
    self.train_loop.run_training_epoch()           
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 589, in run_training_epoch
    self.trainer.run_evaluation(test_mode=False)                                       
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 578, in run_evaluation
    output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)         
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 171, in evaluation_step
    output = self.trainer.accelerator_backend.validation_step(args)                                                                                                 [52/1614]
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 87, in validation_step
    output = self.__validation_step(args)
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 95, in __validation_step
    output = self.trainer.model.validation_step(*args)
  File "/root/CASA-Dialogue-Act-Classifier/Trainer.py", line 82, in validation_step
    loss = F.cross_entropy(logits, targets)
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/functional.py", line 2468, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/functional.py", line 2260, in nll_loss
    if input.size(0) != target.size(0):
IndexError: dimension specified as 0 but tensor has no dimensions

glicerico avatar Feb 12 '21 19:02 glicerico

A workaround is to remove one element from my dataset. A more complete solution is to ask the dataloader to ignore the last batch if it's not complete (as explained here). It shouldn't hurt too much in large datasets, but may not be convenient with small ones.

glicerico avatar Feb 12 '21 20:02 glicerico

Proposing a solution in PR https://github.com/macabdul9/CASA-Dialogue-Act-Classifier/pull/16

glicerico avatar Feb 12 '21 21:02 glicerico