Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

Error Training

Open phamkhactu opened this issue 3 years ago • 0 comments

Hi @NielsRogge Thank for great sharing traing TrOcr, I step by step as you guide But when training I get error:

ex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [39,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [39,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [39,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [39,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
  0%|                                                                                          | 0/67285 [00:05<?, ?it/s]

 File "/home/tupk/anaconda3/envs/ocr/lib/python3.7/site-packages/transformers/models/trocr/modeling_trocr.py", line 144, in forward
    self.weights = self.weights.to(self._float_tensor)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I printed batch input, it's fine

[[0.7647, 0.7647, 0.7647,  ..., 0.4510, 0.4510, 0.4510],
          [0.7647, 0.7647, 0.7647,  ..., 0.4510, 0.4510, 0.4510],
          [0.7647, 0.7647, 0.7647,  ..., 0.4510, 0.4510, 0.4510],
          ...,
          [0.6235, 0.6235, 0.6235,  ..., 0.5608, 0.5608, 0.5608],
          [0.6235, 0.6235, 0.6235,  ..., 0.5608, 0.5608, 0.5608],
          [0.6235, 0.6235, 0.6235,  ..., 0.5608, 0.5608, 0.5608]],

         [[0.5451, 0.5451, 0.5451,  ..., 0.2000, 0.2000, 0.2000],
          [0.5451, 0.5451, 0.5451,  ..., 0.2000, 0.2000, 0.2000],
          [0.5451, 0.5451, 0.5451,  ..., 0.2000, 0.2000, 0.2000],
          ...,
          [0.3569, 0.3569, 0.3569,  ..., 0.2941, 0.2941, 0.2941],
          [0.3569, 0.3569, 0.3569,  ..., 0.2941, 0.2941, 0.2941],
          [0.3569, 0.3569, 0.3569,  ..., 0.2941, 0.2941, 0.2941]]]],
       device='cuda:0'), 'labels': tensor([[    0, 53593,  5142,  ...,  -100,  -100,  -100],
        [    0, 51870,  1117,  ...,  -100,  -100,  -100],
        [    0,  1939, 38817,  ...,  -100,  -100,  -100],
        ...,
        [    0,  7221, 49581,  ...,  -100,  -100,  -100],
        [    0, 22980,  2870,  ...,  -100,  -100,  -100],
        [    0, 12894, 15165,  ...,  -100,  -100,  -100]], device='cuda:0')}

phamkhactu avatar Sep 21 '22 04:09 phamkhactu