TTS [Bug] Cannot fine tune YourTTS with reinit_text_encoder = True due to Runtime Error

[Bug] Cannot fine tune YourTTS with reinit_text_encoder = True due to Runtime Error

Open Ca-ressemble-a-du-fake opened this issue 2 years ago • 0 comments

Describe the bug

Hi,

I am trying the YourTTS recipe with a French dataset and ResNet = 1. It trains great regarding the voice similarity and audio quality BUT there are still some mispronunciations even after 305k steps and it does not improve (the mispronunciations were there from step 60k onwards).

So after watching this video I understood the text encoder may be overfitting, I decided to reset the text encoder and train it for some thousands steps until the pronunciation is OK. My goal is to try and "save" my model trained during a week long.

So in model_args = VitsArgs( I added reinit_text_encoder = True to the list of arguments and use as restore path the path to my 305k step model.

But after around 1h30 minutes I start to get some tensorboardX.x2num:NaN or Inf found in input tensor warning and then an increasing number of losses are becoming NaN and finally I get :

if torch.min(inputs) < left or torch.max(inputs) > right:
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify th
e reduction dim with the 'dim' argument.

I tried to also add reinit_DP=True but the same error appeared. I tried to also add detach_dp_input = False as explained in the video without success. I tried to also use_phonemes = True because my previous VITS models with phonemes did not have such mispronunciations but the same error still appeared.

I searched the web and found that @erogol suggested a bug in torch but I did not change anything to my environment nor did I reboot my computer. Consequently I doubt it applies to my case since I could train VITS, YourTTS without error for months.

Please note : If I continue the training with the recipe original recipe (ie without reinit_text_encoder) it trains normally.

What can I do to only retrain the text encoder so that mispronunciations disappear ? Or is it even possible to correct the mispronunciations (I'd answer positively since it is shown in the video) ?

To Reproduce

Train a model for some steps (I only tried with my last checkpoint which has reached 305 k steps).
Stop the training and add reinit_text_encoder = True for model_args in the YourTTS recipe.
Set the RESTORE_PATH to the checkpoint you want to train from.
Launch this recipe
Wait a little bit and the Runtime Error should occur.

Expected behavior

YourTTS fine tuning with reinit_text_encoder = True should work.

Logs

No response

Environment

- TTS version : 0.10.0
- Pytorch version : 1.13.1+cu117
- Python : 3.10.6
- OS : Ubuntu 22.04

Additional context

No response

Feb 18 '23 04:02 Ca-ressemble-a-du-fake

TTS TTS copied to clipboard

[Bug] Cannot fine tune YourTTS with reinit_text_encoder = True due to Runtime Error

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

TTS
TTS copied to clipboard