xtts-api-server
xtts-api-server copied to clipboard
How do we get fine tuned models to show up in speaker list?
I suspect that Silly Tavern just hasnt been updated to support fine tuned models yet or has a bug thats not showing fine tuned models in the speaker list (ticket for that is here: https://github.com/SillyTavern/SillyTavern/issues/1657 ) But just incase I am doing it wrong, here is how i am loading xtts api. First of all I updated it pip (before the update it did not recognize the models folder command -mf however after the update it does)
The full bat i am using to load it is:
cd xtts
call venv\Scripts\activate
python -m xtts_api_server -mf C:\NewAI\SillyTavern\SillyTavern\xtts\models -sf C:\NewAI\SillyTavern\SillyTavern\xtts\speakers2 --streaming-mode-improve --deepspeed --stream-play-sync
The speakers2 part was a test with just one wav in it rather than messing with my current rather full speakers folder. The fine tuned voice is in the models folder, in its own folder. so models then NarratorNew folder inside NarratorNew folder is: config.json model.pth (large file... VERY large file) reference.wav speakers_xtts.pth vocab.json
When fine tuning in the webui this model worked fine (great infact) but with out instructions on how to moce this to an api server install I somewhat guessed its possible I got it wrong? am i missing something or is it a case of sillytavern is the issue? (it only shows the wav file in speaker2 folder)
For now I will continue finetuning as i have a fair few i want todo.
Also how do we format the json packet for manual sending to the server to use specific fine tuned models?
Currently I am using:
@echo off
setlocal enabledelayedexpansion
rem Set the API endpoint and function
set API_ENDPOINT=http://localhost:8020/tts_to_file
rem Set the input values
rem set SPEAKER_WAV="dave2.wav"
set SPEAKER_WAV="stanlyNarrator.wav"
set LANGUAGE="en"
set FILE_NAME_OR_PATH="narrator.wav"
rem Check if a file is dropped onto the batch file
if "%~1" neq "" (
set "TEXT_FILE=%~1"
) else (
echo No text file dropped. Exiting.
exit /b
)
rem Read the contents of the dropped text file into the TEXT variable
set "TEXT="
for /f "delims=" %%i in ('type "%TEXT_FILE%"') do (
set "LINE=%%i"
rem Escape special characters in the line
set "LINE=!LINE:"=\"!"
set "TEXT=!TEXT!!LINE! "
)
rem Trim trailing whitespace
set "TEXT=!TEXT:~0,-1!"
rem Build the JSON payload
set JSON_PAYLOAD={^
"text": "!TEXT!",^
"speaker_wav": %SPEAKER_WAV%,^
"language": %LANGUAGE%,^
"file_name_or_path": %FILE_NAME_OR_PATH%^
}
rem Write the JSON payload to a temporary file
echo %JSON_PAYLOAD% > temp.json
rem Make the curl request
curl -v -X POST -H "Content-Type: application/json" -d @temp.json %API_ENDPOINT%
rem Remove the temporary file
del temp.json
but naturally set SPEAKER_WAV="stanlyNarrator.wav" needs to be changed to the fine tuned model, but not sure what format to use there (which could also be the silly tavern issue lol
Hi at the moment, you can't switch xtts model via SillyTaven.
Currently, to use a custom model you need to add the flag -v {MODEL NAME} where MODEL NAME is the folder name in the folder where the models are located.
I also prepared some endpoints in a recent update, maybe later they can be added to SillyTavern.
GET http://127.0.0.1:8020/get_models_list - gets a list of all available models
POST http://127.0.0.1:8020/switch_model - switches to the model we pass in
Ah damn, the fine tuned models came out REALLY well. Hopefully silly tavern will implement these end points.
Is there a way to have it load the fine tuned model and ignore the speaker_wav of the json packet?
so using my example of NarratorNew fine tuned model I load an api instance with that as the model and then have it not care what speaker wav is selected it always uses the fine tuned one? that would be a good work around for my use case until silly tavern supports fine tuned models? (specially since they may never support fine tuned models) I could then create launch bat files for each model and use as needed.
Hmmm am I confused here? GamingDaveUK talks about wanting to use a trained voice within silly tavern. It looks your (@daswer123) reply was talking about TTS Models? I'm having this exact problem. Trained voices sound great in webui. I drop the trained folder (config.json, model.pth, reference.json, reference.wav, speakers_xtts.pth, vobab.json) into the Speakers folder within the silly tavern extension folder xtts/speakers/ rename the folder and the speaker sounds American and nothing like the original trained voice within Silly Tavern. To confirm : @daswer123 so when you say model you are referring to the model/v2.0.2? @GamingDaveUk when you say model you are referring to the speaker/trained voice?
He showed how we load the models. Also that silly tavern does not support the models.
However my question extends beyond silly tavern. I can not figure out how to use the models once loaded as it still needs and uses a speaker wav file even when in the webui.
The files you mention need to be in a folder in the models folder (you can make and specify it... not at my PC so can't say the exact settings you need.
Then you load it with -V folder name (again not at pc so mot sure that's fully correct but it's in the docs)
Then it loads your trained model..... however if you send a json packet it's way or a silly tavern packet, or use it in the webui, you have to also pick a speaker wav file from the speaker folder.... this overrides the fine tuning... its a right damn pain.
On Fri, 12 Jan 2024, 15:46 Afterswish007, @.***> wrote:
Hmmm am I confused here? GamingDaveUK talks about wanting to use a trained voice within silly tavern. It looks your @.*** https://github.com/daswer123) reply was talking about TTS Models? I'm having this exact problem. Trained voices sound great in webui. I drop the trained folder (config.json, model.pth, reference.json, reference.wav, speakers_xtts.pth, vobab.json) into the Speakers folder within the silly tavern extension folder xtts/speakers/ rename the folder and the speaker sounds American and nothing like the original trained voice within Silly Tavern. To confirm : @daswer123 https://github.com/daswer123 so when you say model you are referring to the model/v2.0.2? @GamingDaveUk https://github.com/GamingDaveUk when you say model you are referring to the speaker/trained voice?
— Reply to this email directly, view it on GitHub https://github.com/daswer123/xtts-api-server/issues/40#issuecomment-1889537851, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7JQMJMCJVCFFMKCZGKKF43YOFLFTAVCNFSM6AAAAABBQ2J6WGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBZGUZTOOBVGE . You are receiving this because you were mentioned.Message ID: @.***>