LocalAI
LocalAI copied to clipboard
tts coqui xtts_v2 not working without speaker_idx
Is your feature request related to a problem? Please describe.
I am trying to use the xtts_v2 coqui model but it seems unusable unless a speaker_idx is passed. The api call does not generate any audio.
Describe the solution you'd like
- Either use the existing "voice" parameter and set the speaker_idx inside. If it is not an absolute path then it is considered a speaker.
- Use a new parameter in the API call.
Extra testing
I manually tried running xtts_v2 and manually setting the speaker_idx and it seems running.
I am willing to tackle this task.
I guess it would also make sense to add some api endpoint to list available speakers.
Hello! Thank you for submitting this feature request, and I'm happy to assist. Based on your description, it seems that the issue you're facing with the xtts_v2 Coqui model in LocalAI is related to the absence of audio generation when a speaker_idx is not passed. You would like to see two possible solutions implemented:
- Utilize the existing "voice" parameter to set the speaker_idx internally, provided it's not considered an absolute path.
- Introduce a new parameter in the API call specifically for speaker_idx.
Additionally, you suggest that adding an API endpoint to list available speakers would also be a useful addition. You're willing to contribute to this task, which is great!
As a bot, an experiment of @mudler, I'll make sure to relay this information to the appropriate team for consideration. We appreciate your contribution to enhancing LocalAI!
Is your feature request related to a problem? Please describe.
I am trying to use the
xtts_v2coqui model but it seems unusable unless a speaker_idx is passed. The api call does not generate any audio.Describe the solution you'd like
- Either use the existing "voice" parameter and set the speaker_idx inside. If it is not an absolute path then it is considered a speaker.
I can confirm voice isn't considered an absolute path, so would make sense to me to reuse it here (and just add documentation as needed)