TTS
TTS copied to clipboard
Error reported during fine-tuning inference
Describe the bug
tts --text "Text for TTS" --model_path /home/TTS-dev/recipes/ljspeech/xtts_v2/run/training/GPT_XTTS_v2.0_LJSpeech_FT-February-14-2024_05+17AM-0000000 --config_path /home/TTS-dev/recipes/ljspeech/xtts_v2/run/training/GPT_XTTS_v2.0_LJSpeech_FT-February-14-2024_05+17AM-0000000/config.json --out_path output02.wav
Using model: xtts Text: Text for TTS Text splitted to sentences. ['Text for TTS'] Traceback (most recent call last): File "/opt/conda/bin/tts", line 8, in
sys.exit(main()) File "/home/TTS-dev/TTS/bin/synthesize.py", line 468, in main wav = synthesizer.tts( File "/home/TTS-dev/TTS/utils/synthesizer.py", line 386, in tts outputs = self.tts_model.synthesize( File "/home/TTS-dev/TTS/tts/models/xtts.py", line 399, in synthesize "zh-cn" if language == "zh" else language in self.config.languages#"zh-cn" if language == "zh" else language in self.config.languages AssertionError: β Language None is not supported. Supported languages are ['en', 'es', 'fr', 'de', 'it', 'pt', 'pl', 'tr', 'ru', 'nl', 'cs', 'ar', 'zh-cn', 'hu', 'ko', 'ja', 'hi']
To Reproduce
tts --text "Text for TTS" --model_path /home/TTS-dev/recipes/ljspeech/xtts_v2/run/training/GPT_XTTS_v2.0_LJSpeech_FT-February-14-2024_05+17AM-0000000 --config_path /home/TTS-dev/recipes/ljspeech/xtts_v2/run/training/GPT_XTTS_v2.0_LJSpeech_FT-February-14-2024_05+17AM-0000000/config.json --out_path output02.wav
Expected behavior
No response
Logs
No response
Environment
docker
Additional context
No response
tts --text "Text for TTS" --model_path /home/TTS-dev/recipes/ljspeech/xtts_v2/run/training/GPT_XTTS_v2.0_LJSpeech_FT-February-14-2024_05+17AM-0000000 --config_path /home/TTS-dev/recipes/ljspeech/xtts_v2/run/training/GPT_XTTS_v2.0_LJSpeech_FT-February-14-2024_05+17AM-0000000/config.json --language_idx en --out_path output02.wav
Using model: xtts Text: Text for TTS Text splitted to sentences. ['Text for TTS'] Traceback (most recent call last): File "/opt/conda/bin/tts", line 8, in
sys.exit(main()) File "/home/TTS-dev/TTS/bin/synthesize.py", line 468, in main wav = synthesizer.tts( File "/home/TTS-dev/TTS/utils/synthesizer.py", line 386, in tts outputs = self.tts_model.synthesize( File "/home/TTS-dev/TTS/tts/models/xtts.py", line 419, in synthesize return self.full_inference(text, speaker_wav, language, **settings) File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/TTS-dev/TTS/tts/models/xtts.py", line 480, in full_inference (gpt_cond_latent, speaker_embedding) = self.get_conditioning_latents( File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/TTS-dev/TTS/tts/models/xtts.py", line 357, in get_conditioning_latents audio = load_audio(file_path, load_sr) File "/home/TTS-dev/TTS/tts/models/xtts.py", line 73, in load_audio audio, lsr = torchaudio.load(audiopath) File "/opt/conda/lib/python3.10/site-packages/torchaudio/_backend/utils.py", line 204, in load return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size) File "/opt/conda/lib/python3.10/site-packages/torchaudio/_backend/ffmpeg.py", line 336, in load return load_audio(os.path.normpath(uri), frame_offset, num_frames, normalize, channels_first, format) File "/opt/conda/lib/python3.10/posixpath.py", line 340, in normpath path = os.fspath(path) TypeError: expected str, bytes or os.PathLike object, not NoneType
At least for me specifying a speaker_wav (i.e. --speaker_wav <filepath to reference>
) worked. I used the same reference as I did during training.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.