wav output has no sound
@souvikg544 Ive tried running this part of the script, but it raises a no argument error for these two:
--model_path $test_ckpt \
--config_path $test_config \
I manually copy the paths of the Best Model.pth and the config.json under tts_train_dir, and then it works with no error, but the output wav file has no speach, just a monotone buzzing sound. Also tensorboard wouldn't launch so just skipped the step, could be related.
Thank you pulling up the issue . You have added the right path file. The problem is TTS speech generation from text requires at least 100000 epochs to get a suitable output .It also requires a big audio dataset. You can use Colab pro or AWS to achieve the results.
This is the same issue you are talking about ! Refer to the comments in the solution -
https://stackoverflow.com/questions/66307611/how-do-i-get-started-training-a-custom-voice-model-with-mozilla-tts-on-ubuntu-20
Also anyone achieving any solution on colab do let me know the way around ...
@souvikg544 Tried using over an hour of cleaned data, training took about 50 minutes but still out.wav has no speech, just a buzzing sound for a second. Every text extraction was successful, with 483 extracted 10 second bits. Tensorboard not launching so I skipped that stage. Audio Processor from TTS.Utils.Audio shows error first time trying to run command but runs normally with no changes the second time. Inferencing code below:
!tts --text "Text for TTS, to test how well the president of the united states speaks. Maybe what it requires is a verly long sentence that does the job"
--model_path '/content/tts_train_dir/run-September-20-2022_01+46PM-3de0986/best_model.pth'
--config_path '/content/tts_train_dir/run-September-20-2022_01+46PM-3de0986/config.json'
--out_path out.wav
Model recorded 100 epochs in training on 1hr of data, so the suggested 100000 would require 1000 hours of audio?