TTS
TTS copied to clipboard
Trainer stops working without error messages
I'm trying to train TTS for a custom dataset (style of LJSpeech) with non-English alphabet. Here is the command I use:
python TTS\Lib\site-packages\TTS\bin\train_tts.py --config_path data/config.json
and here are the messages I get:
C:\Users\user\Desktop\patents\TTS\TTS\lib\site-packages\TTS\tts\models\tacotron2.py:272: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). alignment_lengths = ( C:\Users\user\Desktop\patents\TTS\TTS\lib\site-packages\torch\functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:2228.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
--> STEP: 0/14 -- GLOBAL_STEP: 0 | > decoder_loss: 1.37356 (1.37356) | > postnet_loss: 3.56385 (3.56385) | > stopnet_loss: 1.32900 (1.32900) | > decoder_coarse_loss: 1.37599 (1.37599) | > decoder_ddc_loss: 0.00168 (0.00168) | > ga_loss: 0.01330 (0.01330) | > decoder_diff_spec_loss: 0.12558 (0.12558) | > postnet_diff_spec_loss: 4.52326 (4.52326) | > decoder_ssim_loss: 0.71161 (0.71161) | > postnet_ssim_loss: 0.70824 (0.70824) | > loss: 5.17926 (5.17926) | > align_error: 0.99135 (0.99135) | > grad_norm: 2.13356 (2.13356) | > current_lr: 0.00000 | > step_time: 14.91710 (14.91709) | > loader_time: 9.09470 (9.09466)
C:\Users\user\Desktop\patents\TTS\TTS\lib\site-packages\TTS\tts\models\tacotron2.py:276: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). alignment_lengths = mel_lengths // self.decoder.r
But then it stops working right after this, meaning that no new messages are posted and no checkpoints are saved, CPU usage goes to near-zero BUT the program does not exit.
PS. Right after I posted this, it posted new output:
BEST MODEL : ./2nd-run\videon-1-June-27-2022_06+24PM-0000000\best_model_14.pth
Number of output frames: 5
EPOCH: 1/1000 --> ./2nd-run\videon-1-June-27-2022_06+24PM-0000000
DataLoader initialization | > Tokenizer: | > add_blank: False | > use_eos_bos: False | > use_phonemes: False | > 2 not found characters: | >
| >
| > Number of instances : 857 | > Preprocessing samples | > Max text length: 1128 | > Min text length: 2 | > Avg text length: 88.03733955659277 | | > Max audio length: 1810702.0 | > Min audio length: 22756.0 | > Avg audio length: 170498.6674445741 | > Num. instances discarded samples: 0 | > Batch group size: 256.
TRAINING (2022-06-28 09:50:49)
so maybe the issue is resolved, but training is very slow for such as small dataset (857 records).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discourse page for further help. https://discourse.mozilla.org/c/tts