CVPR21Chal-SLR icon indicating copy to clipboard operation
CVPR21Chal-SLR copied to clipboard

IndexError in forward func of Conv3D/Sign_Isolated_Conv3D_clip.py

Open DavidMosoarca opened this issue 2 years ago • 0 comments

After I trained the RGB Conv3D on for one training epoch without modifying anything from the source code, after the first train epoch is finished and I get to the val_epoch, the code behaves like this:

$ python Conv3D/Sign_Isolated_Conv3D_clip.py
...
######################Training Started######################
lr:  0.001
epoch   1 | iteration    80 | Loss 5.711482 | Acc 0.00%
epoch   1 | iteration   160 | Loss 5.423379 | Acc 0.00%
epoch   1 | iteration   240 | Loss 5.502132 | Acc 14.29%
epoch   1 | iteration   320 | Loss 5.452106 | Acc 0.00%
epoch   1 | iteration   400 | Loss 5.348779 | Acc 0.00%
epoch   1 | iteration   480 | Loss 5.369306 | Acc 0.00%
epoch   1 | iteration   560 | Loss 5.412856 | Acc 0.00%
epoch   1 | iteration   640 | Loss 5.431209 | Acc 0.00%
epoch   1 | iteration   720 | Loss 5.376038 | Acc 0.00%
epoch   1 | iteration   800 | Loss 5.504383 | Acc 0.00%
epoch   1 | iteration   880 | Loss 5.414754 | Acc 0.00%
epoch   1 | iteration   960 | Loss 5.481614 | Acc 0.00%
epoch   1 | iteration  1040 | Loss 5.402166 | Acc 0.00%
epoch   1 | iteration  1120 | Loss 5.561030 | Acc 0.00%
epoch   1 | iteration  1200 | Loss 5.304134 | Acc 14.29%
epoch   1 | iteration  1280 | Loss 5.452147 | Acc 0.00%
epoch   1 | iteration  1360 | Loss 5.429211 | Acc 0.00%
epoch   1 | iteration  1440 | Loss 5.503419 | Acc 0.00%
epoch   1 | iteration  1520 | Loss 5.407657 | Acc 0.00%
epoch   1 | iteration  1600 | Loss 5.423106 | Acc 0.00%
epoch   1 | iteration  1680 | Loss 5.427852 | Acc 0.00%
epoch   1 | iteration  1760 | Loss 5.387938 | Acc 0.00%
epoch   1 | iteration  1840 | Loss 5.491746 | Acc 0.00%
epoch   1 | iteration  1920 | Loss 5.375609 | Acc 0.00%
epoch   1 | iteration  2000 | Loss 5.529760 | Acc 0.00%
epoch   1 | iteration  2080 | Loss 5.462255 | Acc 0.00%
epoch   1 | iteration  2160 | Loss 5.383886 | Acc 0.00%
epoch   1 | iteration  2240 | Loss 5.354466 | Acc 0.00%
epoch   1 | iteration  2320 | Loss 5.439829 | Acc 0.00%
epoch   1 | iteration  2400 | Loss 5.484483 | Acc 0.00%
epoch   1 | iteration  2480 | Loss 5.388660 | Acc 0.00%
epoch   1 | iteration  2560 | Loss 5.336263 | Acc 0.00%
epoch   1 | iteration  2640 | Loss 5.511293 | Acc 0.00%
epoch   1 | iteration  2720 | Loss 5.430277 | Acc 0.00%
epoch   1 | iteration  2800 | Loss 5.447950 | Acc 0.00%
epoch   1 | iteration  2880 | Loss 5.434804 | Acc 0.00%
epoch   1 | iteration  2960 | Loss 5.414961 | Acc 0.00%
epoch   1 | iteration  3040 | Loss 5.452834 | Acc 0.00%
epoch   1 | iteration  3120 | Loss 5.405386 | Acc 0.00%
epoch   1 | iteration  3200 | Loss 5.377852 | Acc 0.00%
epoch   1 | iteration  3280 | Loss 5.378382 | Acc 0.00%
epoch   1 | iteration  3360 | Loss 5.481858 | Acc 0.00%
epoch   1 | iteration  3440 | Loss 5.544360 | Acc 0.00%
epoch   1 | iteration  3520 | Loss 5.439571 | Acc 0.00%
epoch   1 | iteration  3600 | Loss 5.497654 | Acc 0.00%
epoch   1 | iteration  3680 | Loss 5.374403 | Acc 0.00%
epoch   1 | iteration  3760 | Loss 5.400540 | Acc 0.00%
epoch   1 | iteration  3840 | Loss 5.482468 | Acc 0.00%
epoch   1 | iteration  3920 | Loss 5.428809 | Acc 0.00%
epoch   1 | iteration  4000 | Loss 5.400549 | Acc 0.00%
Average Training Loss of Epoch 1: 5.445218 | Acc: 0.39%
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:490: UserWarning: This DataLoader will create 6 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  cpuset_checked))
Traceback (most recent call last):
  File "/content/codebase/CVPR21Chal-SLR/Conv3D/Sign_Isolated_Conv3D_clip.py", line 165, in <module>
    logger, writer)
  File "/content/codebase/CVPR21Chal-SLR/Conv3D/validation_clip.py", line 27, in val_epoch
    loss = criterion(outputs, labels.squeeze())
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/codebase/CVPR21Chal-SLR/Conv3D/Sign_Isolated_Conv3D_clip.py", line 27, in forward
    nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
  • Do you have any suggestions on what could be wrong here and why does the forward function present such a strange behaviour?
  • Even more important, what could be the solution to this problem?

DavidMosoarca avatar Jun 10 '22 05:06 DavidMosoarca