sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

add phoonnx models

Open JarbasAl opened this issue 2 months ago • 5 comments

hello

I am working in my own TTS engine https://github.com/TigreGotico/phoonnx

when using the espeak phonemizer the models are compatible with piper TTS, in fact you already are using my models #2530

with my latest training code the .json changed slightly, so i thought it was time to open an issue about phoonnx

new model:

  • https://huggingface.co/OpenVoiceOS/phoonnx_ar-SA_miro_espeak

please note phoonnx is in it's early days and i wouldn't exactly consider it production ready, but it works!

the various phonemizers are still undergoing testing and will need to be a consideration if sherpa decides to support the non-espeak based models

import json
from typing import Any, Dict

import onnx


def add_meta_data(filename: str, meta_data: Dict[str, Any]):
    """Add meta data to an ONNX model. It is changed in-place.

    Args:
      filename:
        Filename of the ONNX model to be changed.
      meta_data:
        Key-value pairs.
    """
    model = onnx.load(filename)
    for key, value in meta_data.items():
        meta = model.metadata_props.add()
        meta.key = key
        meta.value = str(value)

    onnx.save(model, filename)


def load_config(model):
    with open(f"{model}.json", "r") as file:
        config = json.load(file)
    return config


def generate_tokens(config):
    id_map = config["phoneme_id_map"]
    with open("tokens.txt", "w", encoding="utf-8") as f:
        for s, i in id_map.items():
            if s == "\n": # skip invalid token
                continue
            f.write(f"{s} {i}\n")
    print("Generated tokens.txt")


def main():

    filename = "miro_ar-SA.onnx"

    config = load_config(filename)

    alphabet = config["alphabet"]
    phonemizer = config["phoneme_type"]
    if alphabet != "ipa" or phonemizer != "espeak":
        raise RuntimeError("only phoonnx models trained with 'ipa' and 'espeak' are supported")

    print("generate tokens")
    generate_tokens(config)

    print("add model metadata")
    meta_data = {
        "model_type": "vits",
        "comment": "piper",  # NOTE: only phoonnx models trained using espeak + ipa
        "language": "Arabic",
        "voice": config["lang_code"],  # e.g., en-us
        "has_espeak": 1,
        "n_speakers": config["num_speakers"],
        "sample_rate": config["audio"]["sample_rate"],
    }
    print(meta_data)
    add_meta_data(filename, meta_data)


main()

JarbasAl avatar Sep 19 '25 14:09 JarbasAl

arabic female model

https://huggingface.co/OpenVoiceOS/phoonnx_ar-SA_dii_espeak

JarbasAl avatar Sep 30 '25 14:09 JarbasAl

arabic male V2 model , trained on a better dataset

https://huggingface.co/OpenVoiceOS/phoonnx_ar-SA_miro_espeak_V2

We also published a guest blog post about our collaboration with visually impaired arabic users to create these models https://blog.openvoiceos.org/posts/2025-10-01-arabic_tts_collaboration

JarbasAl avatar Oct 01 '25 12:10 JarbasAl

with this PR https://github.com/TigreGotico/phoonnx/pull/19

models should be sherpa compatible out of the box

JarbasAl avatar Oct 03 '25 17:10 JarbasAl

basque male voice https://huggingface.co/OpenVoiceOS/phoonnx_eu-ES_miro_espeak

should already include the metadata keys expected by sherpa in the model itself, also provides tokens.txt directly in the same repo

JarbasAl avatar Oct 05 '25 16:10 JarbasAl

basque female voice https://huggingface.co/OpenVoiceOS/phoonnx_eu-ES_dii_espeak

samples for both basque voices here https://blog.openvoiceos.org/posts/2025-10-06-phoonnx

JarbasAl avatar Oct 06 '25 16:10 JarbasAl