TTS
TTS copied to clipboard
[Bug] Multiple speaker requests?
Describe the bug
The TTS API states that speaker_wav
can be a list of filepaths for multiple speaker references. But in def tts_to_file(...)
, speaker_wav
only accepts a single string.
To Reproduce
tts.tts_to_file(
text="Some test",
file_path="output.wav",
speaker_wav=["training/1.wav"],
language="en",
)
Expected behavior
No response
Logs
No response
Environment
{
"CUDA": {
"GPU": [],
"available": false,
"version": null
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "2.1.1",
"TTS": "0.20.6",
"numpy": "1.26.2"
},
"System": {
"OS": "Darwin",
"architecture": [
"64bit",
""
],
"processor": "arm",
"python": "3.11.6",
"version": "Darwin Kernel Version 22.5.0: Thu Jun 8 22:22:20 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6000"
}
}
Additional context
No response
tried giving a list of strings?
It works for me in this way:
tts.tts_to_file(
text="Some test",
file_path="output.wav",
speaker_wav=["training/1.wav","training/2.wav","training/3.wav"],
language="en",
)
But the order is important becase the main timbre will be the first wav in the list... it's like the other ones add features to the first voice...
What I wish to know is how to SAVE the "speaker" once I find the right list, to save time so it does not have to analyze it every time.
I was able to do successfully generate audio using both a list of strings with length of 1 and 2.
from TTS.api import TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2", gpu=True)
tts.tts_to_file(
text="Some test text",
file_path="output.wav",
speaker_wav=["test_wavs/1.wav", "test_wavs/2.wav"],
language="en",
)
tts.tts_to_file(
text="Some test text",
file_path="output.wav",
speaker_wav=["test_wavs/1.wav"],
language="en",
)
try upgrading to TTS version 0.21.2
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.