dsnote icon indicating copy to clipboard operation
dsnote copied to clipboard

Unable to add Custom TTS model (i.e Coqui TTS)

Open akshatrocky opened this issue 1 year ago • 3 comments

I was unable to add Custom TTS (i.e Coqui TTS). Tried to add model information in model.json but it doesn't seems to work, maybe I am doing it wrong. What is the procedure to add Custom TTS model in Speech Note application. Thanks for making this great app for Linux :)

akshatrocky avatar Apr 17 '24 10:04 akshatrocky

Hi. Thanks for the report.

As you probably know, you need to edit ~/.var/app/net.mkiol.SpeechNote/data/net.mkiol/dsnote/models.json file and add new entry with model configuration.

This entry should be similar to the one below.

        {
            "name": "New cool voice",
            "model_id": "en_coqui_new_cool_model",
            "engine": "tts_coqui",
            "lang_id": "en",
            "checksum": "8bc7e85b",
            "checksum_quick": "50984d2b",
            "comp": "dir",
            "urls": [
                "file:///path/to/model/config.json",
                "file:///path/to/model/model.pth"
            ],
            "size": "100827994"
        },

Few important remarks:

  • model_id has to be unique
  • If the model files are located on your local drive, use the file:// URL type.
  • Put URLs for every file that is needed by the model (config.json and model.pth are just an example)
  • If your model uses custom vocoder you need to add it in the sups sub object (example: es_coqui_tacotron_mai from models.json)
  • To generate checksum and checksum_quick, use --gen-checksum command line option. To do this, put empty strings in both checksum and checksum_quick, save the file and run Speech Note with --verbose --gen-checksum options
flatpak run net.mkiol.SpeechNote --verbose --gen-checksums

The model will be downloaded automatically and the checksum should appear on the terminal.

[D] 18:15:52.802230735.802 0x7709dea87d00 () - all checksums were generated
models checksums:

"model_id": "fr_coqui_css100_vits",
"checksum": "a7671b81",
"checksum_quick": "7d7531cf",
"size": "100821187",

Let me know if any of this was helpful.

mkiol avatar Apr 18 '24 18:04 mkiol

Thanks, this did work but what about adding a custom multi-language model i.e fine tuned XTTS model on it? Do I have to add multiple model ids for different language the XTTS model supports?

ghost avatar Apr 24 '24 11:04 ghost

XTTS? Nice :)

custom multi-language model

For multilingual models you may use "model aliases". Alias is a copy of the model entry but with changed properties (like language for instance). To create alias, define new model entry with model_alias_of param. Look at the example below.

Model multilang_coqui_xtts203 is a base model. It is hidden for the user thanks to hidden : true. This "base" model is used by en_coqui_xtts203 and pt_coqui_br_xtts203 aliases.

        {
            "name": "Multilingual (Coqui XTTS-v2.0.3)",
            "model_id": "multilang_coqui_xtts203",
            "engine": "tts_coqui",
            "lang_id": "multilang",
            "checksum": "ae3c9981",
            "checksum_quick": "ce376c5d",
            "options": "xs",
            "features": [
                "tts_voice_cloning"
            ],
            "license": {
                "id": "CPML",
                "name": "Coqui Public Model License 1.0.0",
                "url": "https://coqui.ai/cpml.txt",
                "accept_required": true
            },
            "comp": "dir",
            "urls": [
                "https://huggingface.co/coqui/XTTS-v2/resolve/69d4f754575c4b72d991f105b4775d270438ef33/model.pth",
                "https://huggingface.co/coqui/XTTS-v2/resolve/69d4f754575c4b72d991f105b4775d270438ef33/config.json",
                "https://huggingface.co/coqui/XTTS-v2/resolve/69d4f754575c4b72d991f105b4775d270438ef33/vocab.json"
            ],
            "size": "1868302897",
            "hidden": true
        },
        {
            "name": "English (Coqui XTTS-v2.0.3)",
            "model_id": "en_coqui_xtts203",
            "model_alias_of": "multilang_coqui_xtts203",
            "lang_id": "en"
        },
        {
            "name": "Português brasileiro (Coqui XTTS-v2.0.3)",
            "model_id": "pt_coqui_br_xtts203",
            "model_alias_of": "multilang_coqui_xtts203",
            "lang_id": "pt"
        },

mkiol avatar Apr 24 '24 18:04 mkiol