TTS_Data_Maker icon indicating copy to clipboard operation
TTS_Data_Maker copied to clipboard

wav output has no sound

Open kudzaijaure-dot opened this issue 3 years ago • 3 comments

@souvikg544 Ive tried running this part of the script, but it raises a no argument error for these two:

  --model_path $test_ckpt \
  --config_path $test_config \

I manually copy the paths of the Best Model.pth and the config.json under tts_train_dir, and then it works with no error, but the output wav file has no speach, just a monotone buzzing sound. Also tensorboard wouldn't launch so just skipped the step, could be related.

kudzaijaure-dot avatar Sep 14 '22 13:09 kudzaijaure-dot

Thank you pulling up the issue . You have added the right path file. The problem is TTS speech generation from text requires at least 100000 epochs to get a suitable output .It also requires a big audio dataset. You can use Colab pro or AWS to achieve the results.

This is the same issue you are talking about ! Refer to the comments in the solution -

https://stackoverflow.com/questions/66307611/how-do-i-get-started-training-a-custom-voice-model-with-mozilla-tts-on-ubuntu-20

souvikg544 avatar Sep 14 '22 15:09 souvikg544

Also anyone achieving any solution on colab do let me know the way around ...

souvikg544 avatar Sep 14 '22 15:09 souvikg544

@souvikg544 Tried using over an hour of cleaned data, training took about 50 minutes but still out.wav has no speech, just a buzzing sound for a second. Every text extraction was successful, with 483 extracted 10 second bits. Tensorboard not launching so I skipped that stage. Audio Processor from TTS.Utils.Audio shows error first time trying to run command but runs normally with no changes the second time. Inferencing code below: !tts --text "Text for TTS, to test how well the president of the united states speaks. Maybe what it requires is a verly long sentence that does the job"
--model_path '/content/tts_train_dir/run-September-20-2022_01+46PM-3de0986/best_model.pth'
--config_path '/content/tts_train_dir/run-September-20-2022_01+46PM-3de0986/config.json'
--out_path out.wav

Model recorded 100 epochs in training on 1hr of data, so the suggested 100000 would require 1000 hours of audio?

kudzaijaure-dot avatar Sep 20 '22 14:09 kudzaijaure-dot