sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

FYI Sherpa-onnx in TTS-Wrapper

Open willwade opened this issue 1 year ago • 6 comments

Just a quick FYI..

I've VERY QUICKLY (so bugs beware) added sherpa-onnx to this python tts-wrapper..

https://github.com/willwade/tts-wrapper?tab=readme-ov-file#sherpa-onnx

We do fun things like listing available voices, auto downloading models and running them.. then our abstract class deals with playing, pausing, streaming audio etc. Many aspects like pitch, volume control and SSML we obviously cant deal with sherpa-onnx. Or word events.. But hey..

I'm really just focusing on MMS models right now.. I'd love a lovely JSON file of all TTS models we could use if anyone has it and I can add them.

Needs a heap loads of testing and very welcoming to PR's. Particularly rewriting automated tests and checking/improving audio playback.. I feel there is a lag somewhere and I cant figure out where..

Use like

pip install "tts-wrapper[sherpaonnx] @ git+https://github.com/willwade/tts-wrapper"

then


from tts_wrapper import SherpaOnnxClient, SherpaOnnxTTS
try:
    client = SherpaOnnxClient(model_path=None, tokens_path=None)
    # or 
    # client = SherpaOnnxClient(model_path=None, tokens_path=None, voice_id="eng")
    # where voice_id is a id from the get_voices - typically an iso code from mms
    # Initialize the TTS engine
    tts = SherpaOnnxTTS(client)

    # Get available voices
    voices = tts.get_voices()
    print("Available voices:", voices)

    # Set the voice using ISO code
    iso_code = "eng"  # Example ISO code for the voice
    tts.set_voice(iso_code)

    # Define the text to be synthesized
    text = "Hello, This is a word timing test"
    start_time = time.time()
    tts.speak(text)

except Exception as e:
    print(f"Error: {e}")

Take a look at play/pause/resume etc..

https://github.com/willwade/tts-wrapper?tab=readme-ov-file#streaming-and-playback-control

willwade avatar Jul 24 '24 22:07 willwade

I'm trying to support all other models.. But to do that I really could do with a nice formatted list of all models available.. So I'm trying this..


import requests
import json
import re

def get_github_release_assets(repo, tag):
    headers = {'Accept': 'application/vnd.github.v3+json'}
    
    # Get the release ID for the specified tag
    releases_url = f"https://api.github.com/repos/{repo}/releases/tags/{tag}"
    response = requests.get(releases_url, headers=headers)
    
    if response.status_code != 200:
        raise Exception(f"Failed to fetch release info for tag: {tag}")
    
    release_info = response.json()
    
    # Get the assets
    assets = []
    for asset in release_info.get('assets', []):
        filename = asset['name']
        asset_url = asset['browser_download_url']
        
        # Remove the file extension for further processing
        filename_no_ext = re.sub(r'\.tar\.bz2|\.tar\.gz|\.zip', '', filename)
        parts = filename_no_ext.split('-')
        
        model_type = 'vits' if parts[0] == 'vits' else 'unknown'
        developer = parts[1] if len(parts) > 1 else 'unknown'
        
        if developer == 'zh':
            lang_code = 'zh'
            developer = parts[2] if len(parts) > 2 else 'unknown'
            name = parts[3] if len(parts) > 3 else 'unknown'
            quality = parts[4] if len(parts) > 4 else 'unknown'
        else:
            lang_code = parts[2].replace('_', '-') if len(parts) > 2 else 'unknown'
            name = parts[3] if len(parts) > 3 else 'unknown'
            quality = parts[4] if len(parts) > 4 else 'unknown'
        
        if developer == 'zh':
            name = parts[3] if len(parts) > 3 else 'unknown'
            quality = 'unknown'
        else:
            lang_code = parts[2].replace('_', '-') if len(parts) > 2 else 'unknown'
            name = parts[3] if len(parts) > 3 else 'unknown'
            quality = parts[4] if len(parts) > 4 else 'unknown'
        
        if len(parts) == 5:
            quality = parts[-1]

        if developer == 'zh' and len(parts) > 3:
            lang_code = 'zh'
            developer = parts[2]
            name = parts[3]
            quality = 'unknown'

        if len(parts) == 4 and developer == 'vctk':
            lang_code = 'unknown'
            name = 'unknown'
            quality = 'unknown'
        
        # Determine if the asset is compressed
        compression = filename.endswith(('.tar.bz2', '.tar.gz', '.zip'))
        
        # Add asset info to the list
        assets.append({
            'model_type': model_type,
            'developer': developer,
            'language_code': lang_code,
            'name': name,
            'quality': quality,
            'url': asset_url,
            'compression': compression
        })
    
    # Convert the list of assets to JSON
    assets_json = json.dumps(assets, indent=4)
    
    return assets_json

# Example usage
repo = "k2-fsa/sherpa-onnx"
tag = "tts-models"
assets_json = get_github_release_assets(repo, tag)
print(assets_json)


But its not great.. eg


{
        "model_type": "vits",
        "developer": "mimic3",
        "language_code": "gu-IN",
        "name": "cmu",
        "quality": "indic_low",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-mimic3-gu_IN-cmu-indic_low.tar.bz2",
        "compression": true
    },
{
        "model_type": "vits",
        "developer": "piper",
        "language_code": "de-DE",
        "name": "ramona",
        "quality": "low",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-ramona-low.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "de-DE",
        "name": "thorsten",
        "quality": "high",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-thorsten-high.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "de-DE",
        "name": "thorsten",
        "quality": "low",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-thorsten-low.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "de-DE",
        "name": "thorsten",
        "quality": "medium",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-thorsten-medium.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "de-DE",
        "name": "thorsten_emotional",
        "quality": "medium",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-thorsten_emotional-medium.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "el-GR",
        "name": "rapunzelina",
        "quality": "low",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-el_GR-rapunzelina-low.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "en-GB",
        "name": "alan",
        "quality": "low",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_GB-alan-low.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "en-GB",
        "name": "alan",
        "quality": "medium",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_GB-alan-medium.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "hf",
        "language_code": "hf",
        "name": "keqing",
        "quality": "unknown",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-zh-hf-keqing.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "hf",
        "language_code": "hf",
        "name": "theresa",
        "quality": "unknown",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-zh-hf-theresa.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "hf",
        "language_code": "hf",
        "name": "zenyatta",
        "quality": "unknown",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-zh-hf-zenyatta.tar.bz2",
        "compression": true
    }
    

any advice welcome on how those tts releases are formatted :)

willwade avatar Jul 25 '24 07:07 willwade

any advice welcome on how those tts releases are formatted :)

I suggest that you handle models like

  • vits-zh-hf-xxx
  • sherpa-onnx-vits-xxx
  • icefall-vits-xxx separately.

csukuangfj avatar Jul 25 '24 09:07 csukuangfj

@csukuangfj the models released under your tts tag release. What action generates those? I'm wondering about creating a json file of all models available

willwade avatar Oct 20 '24 11:10 willwade

@csukuangfj the models released under your tts tag release. What action generates those? I'm wondering about creating a json file of all models available

Please see https://github.com/csukuangfj/models/tree/small-fixes/.github/workflows

All workflows containing vits in the above link.

csukuangfj avatar Oct 20 '24 14:10 csukuangfj

Great. I think its probably best I do the hackier scraping approach

https://github.com/willwade/tts-wrapper/blob/main/tts_wrapper/engines/sherpaonnx/publishedmodels.json

https://github.com/willwade/tts-wrapper/blob/main/tts_wrapper/engines/sherpaonnx/createindex.py

NB: Not including the 1107 MMS models

willwade avatar Oct 20 '24 21:10 willwade

NB: Correct link for all models - incl kokoro https://github.com/willwade/tts-wrapper/blob/main/tts_wrapper/engines/sherpaonnx/merged_models.json

willwade avatar Sep 04 '25 09:09 willwade