SpeechT5
SpeechT5 copied to clipboard
SpeechT5-tts fine-tuned on Chinese
I used colab notebookto fine-tuned this model.When I run trainer.train(),It goes into error.
in <cell line: 2>:2 │
│ │
│ /usr/local/lib/python3.9/dist-packages/transformers/trainer.py:1662 in train │
│ │
│ 1659 │ │ inner_training_loop = find_executable_batch_size( │
│ 1660 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1661 │ │ ) │
│ ❱ 1662 │ │ return inner_training_loop( │
│ 1663 │ │ │ args=args, │
│ 1664 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1665 │ │ │ trial=trial, │
│ │
│ /usr/local/lib/python3.9/dist-packages/transformers/trainer.py:1839 in _inner_training_loop │
│ │
│ 1836 │ │ self.state.is_world_process_zero = self.is_world_process_zero() │
│ 1837 │ │ │
│ 1838 │ │ # tr_loss is a tensor to avoid synchronization of TPUs through .item() │
│ ❱ 1839 │ │ tr_loss = torch.tensor(0.0).to(args.device) │
│ 1840 │ │ # _total_loss_scalar is updated everytime .item() has to be called on tr_loss an │
│ 1841 │ │ self._total_loss_scalar = 0.0 │
│ 1842 │ │ self._globalstep_last_logged = self.state.global_step │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be
incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I do use GPU,why did this error happen?