SpeechT5 icon indicating copy to clipboard operation
SpeechT5 copied to clipboard

SpeechT5-tts fine-tuned on Chinese

Open qlmbeck opened this issue 1 year ago • 4 comments

I used colab notebookto fine-tuned this model.When I run trainer.train(),It goes into error.

in <cell line: 2>:2                                                                              │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/trainer.py:1662 in train                     │
│                                                                                                  │
│   1659 │   │   inner_training_loop = find_executable_batch_size(                                 │
│   1660 │   │   │   self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size  │
│   1661 │   │   )                                                                                 │
│ ❱ 1662 │   │   return inner_training_loop(                                                       │
│   1663 │   │   │   args=args,                                                                    │
│   1664 │   │   │   resume_from_checkpoint=resume_from_checkpoint,                                │
│   1665 │   │   │   trial=trial,                                                                  │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/trainer.py:1839 in _inner_training_loop      │
│                                                                                                  │
│   1836 │   │   self.state.is_world_process_zero = self.is_world_process_zero()                   │
│   1837 │   │                                                                                     │
│   1838 │   │   # tr_loss is a tensor to avoid synchronization of TPUs through .item()            │
│ ❱ 1839 │   │   tr_loss = torch.tensor(0.0).to(args.device)                                       │
│   1840 │   │   # _total_loss_scalar is updated everytime .item() has to be called on tr_loss an  │
│   1841 │   │   self._total_loss_scalar = 0.0                                                     │
│   1842 │   │   self._globalstep_last_logged = self.state.global_step                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be 
incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I do use GPU,why did this error happen?

qlmbeck avatar Apr 23 '23 09:04 qlmbeck