LocalAI icon indicating copy to clipboard operation
LocalAI copied to clipboard

TTS API improvements

Open blob42 opened this issue 4 months ago • 7 comments

Description

Improvements to the Coqui TTS API/backend.

  • [x] #2073: Allow passing speaker_id to models
  • [x] Add optional language parameter to TTS endpoint/schema
  • ~[ ] TTS Info endpoint: List available models, speakers and languages~ (will start new PR for this one)
  • [x] update swagger documentation
  • [x] define tts models with config files
  • [x] updated docs

Notes for Reviewers

Signed commits

  • [x] Yes, I signed my commits.

blob42 avatar Apr 20 '24 13:04 blob42

Deploy Preview for localai canceled.

Name Link
Latest commit b2361dcecda992f8894423954cbb80e8c39fba7a
Latest deploy log https://app.netlify.com/sites/localai/deploys/6632c4cbace1270008899180

netlify[bot] avatar Apr 20 '24 13:04 netlify[bot]

I don't see how the changeset can fix https://github.com/mudler/LocalAI/issues/2073 - is there something missing in the PR?

mudler avatar Apr 20 '24 18:04 mudler

@mudler I didn't push those changes yet, I will remove the draft status when I will be done

blob42 avatar Apr 22 '24 05:04 blob42

@mudler I am trying to understand where/when is the go gRPC server -> TTS service used, Is this a work in progress ?

blob42 avatar Apr 22 '24 12:04 blob42

I didn't push the swagger docs, it gave me alot of changes.

Quick way to test the language switching capability with multilingual models is something like this:

Without specifying lang:

The voice uses an English accent.

curl -L http://localai:8080/tts \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer 2708b7c21129e408899d5a38e6d1af8d " \
    -d '{
"backend": "coqui",
"input": "Bonjour Madame ! Comment allez-vous ?",
"model": "tts_models/multilingual/multi-dataset/xtts_v2",
"voice": "Ana Florence"
}' | aplay -D pipewire -

With lang:

Proper language accent is used

curl -L http://localai:8080/tts \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer 2708b7c21129e408899d5a38e6d1af8d " \
    -d '{
"backend": "coqui",
"input": "Bonjour Madame ! Comment allez-vous ?",
"model": "tts_models/multilingual/multi-dataset/xtts_v2",
"voice": "Ana Florence",
"lang": "fr"
}' | aplay -D pipewire -

blob42 avatar Apr 23 '24 13:04 blob42

Quick update regarding adding TTS Info endpoint. I am skipping this feature from this PR is it would involve too many changes that are out of scope for this PR.

Context:

The goal is to have the possibility to query available models/speakers or other type of information depending on the backend.

My first attempt was to add a gRPC service TTSInfoRequest to query the backend. I found out down the road that the backend grpc service is loaded with the model at the same time, however Info requests might not send any model infromation.

My proposal is to allow backends grpc service to be spawned without a model and to add a service called Info() or Query that backends can use to send arbitrary infromation. A model could be loaded later using the same spawned service or tear-down and start a new one for the designated model.

I will start a PR or Discussion for this proposal.

blob42 avatar Apr 29 '24 15:04 blob42

overall looks good, thanks! just few nits/open questions above

mudler avatar Apr 29 '24 20:04 mudler