TTS
TTS copied to clipboard
[Bug] Training XTTSv2 with DDP leads to weird training lags
Describe the bug
Hello, training XTTSv2 leads to weird training lags with using DDP - training gets stuck with no errors x6 RTX a6000 and 512GB RAM
Here is monitoring GPU load graph. Purple - gpu0, green - gpu1 (all the rest GPUs behave like gpu1)
With 2 or 4 GPU situation remains the same
I think there's some kind of error in Trainer or in xtts scripts maybe my dataset is kinda large, 2000hrs of 1 language
To Reproduce
python -m trainer.distribute --script recipes/ljspeech/xtts_v2/train_gpt_xtts.py --gpus 0,1,2,3,4,5
Expected behavior
training must not get stuck
Logs
No response
Environment
tts version: latest
Additional context
No response