TTS icon indicating copy to clipboard operation
TTS copied to clipboard

[Bug] Multiple speaker requests?

Open mukundt opened this issue 1 year ago β€’ 4 comments

Describe the bug

The TTS API states that speaker_wav can be a list of filepaths for multiple speaker references. But in def tts_to_file(...), speaker_wav only accepts a single string.

To Reproduce

tts.tts_to_file(
    text="Some test",
    file_path="output.wav",
    speaker_wav=["training/1.wav"],
    language="en",
)

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": null
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.1",
        "TTS": "0.20.6",
        "numpy": "1.26.2"
    },
    "System": {
        "OS": "Darwin",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "arm",
        "python": "3.11.6",
        "version": "Darwin Kernel Version 22.5.0: Thu Jun  8 22:22:20 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6000"
    }
}

Additional context

No response

mukundt avatar Nov 20 '23 18:11 mukundt

tried giving a list of strings?

erogol avatar Nov 28 '23 10:11 erogol

It works for me in this way:

tts.tts_to_file(
    text="Some test",
    file_path="output.wav",
    speaker_wav=["training/1.wav","training/2.wav","training/3.wav"],
    language="en",
)

But the order is important becase the main timbre will be the first wav in the list... it's like the other ones add features to the first voice...

What I wish to know is how to SAVE the "speaker" once I find the right list, to save time so it does not have to analyze it every time.

Zibri avatar Nov 30 '23 20:11 Zibri

I was able to do successfully generate audio using both a list of strings with length of 1 and 2.


from TTS.api import TTS


tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2", gpu=True)

tts.tts_to_file(
    text="Some test text",
    file_path="output.wav",
    speaker_wav=["test_wavs/1.wav", "test_wavs/2.wav"],
    language="en",
)



tts.tts_to_file(
    text="Some test text",
    file_path="output.wav",
    speaker_wav=["test_wavs/1.wav"],
    language="en",
)

try upgrading to TTS version 0.21.2

loganhart02 avatar Nov 30 '23 21:11 loganhart02

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

stale[bot] avatar Jan 02 '24 09:01 stale[bot]