LocalAI
LocalAI copied to clipboard
TTS API improvements
Description
Improvements to the Coqui TTS API/backend.
- [x] #2073: Allow passing speaker_id to models
- [x] Add optional
language
parameter to TTS endpoint/schema - ~[ ] TTS Info endpoint: List available models, speakers and languages~ (will start new PR for this one)
- [x] update swagger documentation
- [x] define tts models with config files
- [x] updated docs
Notes for Reviewers
Signed commits
- [x] Yes, I signed my commits.
Deploy Preview for localai canceled.
Name | Link |
---|---|
Latest commit | b2361dcecda992f8894423954cbb80e8c39fba7a |
Latest deploy log | https://app.netlify.com/sites/localai/deploys/6632c4cbace1270008899180 |
I don't see how the changeset can fix https://github.com/mudler/LocalAI/issues/2073 - is there something missing in the PR?
@mudler I didn't push those changes yet, I will remove the draft status when I will be done
@mudler I am trying to understand where/when is the go gRPC server -> TTS service used, Is this a work in progress ?
I didn't push the swagger docs, it gave me alot of changes.
Quick way to test the language switching capability with multilingual models is something like this:
Without specifying lang:
The voice uses an English accent.
curl -L http://localai:8080/tts \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 2708b7c21129e408899d5a38e6d1af8d " \
-d '{
"backend": "coqui",
"input": "Bonjour Madame ! Comment allez-vous ?",
"model": "tts_models/multilingual/multi-dataset/xtts_v2",
"voice": "Ana Florence"
}' | aplay -D pipewire -
With lang:
Proper language accent is used
curl -L http://localai:8080/tts \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 2708b7c21129e408899d5a38e6d1af8d " \
-d '{
"backend": "coqui",
"input": "Bonjour Madame ! Comment allez-vous ?",
"model": "tts_models/multilingual/multi-dataset/xtts_v2",
"voice": "Ana Florence",
"lang": "fr"
}' | aplay -D pipewire -
Quick update regarding adding TTS Info endpoint. I am skipping this feature from this PR is it would involve too many changes that are out of scope for this PR.
Context:
The goal is to have the possibility to query available models/speakers or other type of information depending on the backend.
My first attempt was to add a gRPC service TTSInfoRequest
to query the backend. I found out down the road that the backend grpc service is loaded with the model at the same time, however Info requests might not send any model infromation.
My proposal is to allow backends grpc service to be spawned without a model and to add a service called Info()
or Query
that backends can use to send arbitrary infromation. A model could be loaded later using the same spawned service or tear-down and start a new one for the designated model.
I will start a PR or Discussion for this proposal.
overall looks good, thanks! just few nits/open questions above