LocalAI icon indicating copy to clipboard operation
LocalAI copied to clipboard

tts coqui xtts_v2 not working without speaker_idx

Open blob42 opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe.

I am trying to use the xtts_v2 coqui model but it seems unusable unless a speaker_idx is passed. The api call does not generate any audio.

Describe the solution you'd like

  1. Either use the existing "voice" parameter and set the speaker_idx inside. If it is not an absolute path then it is considered a speaker.
  2. Use a new parameter in the API call.

Extra testing

I manually tried running xtts_v2 and manually setting the speaker_idx and it seems running.

I am willing to tackle this task.

I guess it would also make sense to add some api endpoint to list available speakers.

blob42 avatar Apr 19 '24 02:04 blob42

Hello! Thank you for submitting this feature request, and I'm happy to assist. Based on your description, it seems that the issue you're facing with the xtts_v2 Coqui model in LocalAI is related to the absence of audio generation when a speaker_idx is not passed. You would like to see two possible solutions implemented:

  1. Utilize the existing "voice" parameter to set the speaker_idx internally, provided it's not considered an absolute path.
  2. Introduce a new parameter in the API call specifically for speaker_idx.

Additionally, you suggest that adding an API endpoint to list available speakers would also be a useful addition. You're willing to contribute to this task, which is great!

As a bot, an experiment of @mudler, I'll make sure to relay this information to the appropriate team for consideration. We appreciate your contribution to enhancing LocalAI!

localai-bot avatar Apr 19 '24 08:04 localai-bot

Is your feature request related to a problem? Please describe.

I am trying to use the xtts_v2 coqui model but it seems unusable unless a speaker_idx is passed. The api call does not generate any audio.

Describe the solution you'd like

  1. Either use the existing "voice" parameter and set the speaker_idx inside. If it is not an absolute path then it is considered a speaker.

I can confirm voice isn't considered an absolute path, so would make sense to me to reuse it here (and just add documentation as needed)

mudler avatar Apr 19 '24 08:04 mudler