sherpa-onnx Support for Offline Phonemization Input in Sherpa-ONNX VITS TTS

Support for Offline Phonemization Input in Sherpa-ONNX VITS TTS

Open anirpipi opened this issue 6 months ago • 4 comments

In the Sherpa-ONNX VITS-based TTS pipeline, we want to convert the input text into phonemes beforehand and pass the phoneme sequence directly to the ONNX model, instead of passing the original text. e.g. text: "How are you doing today?" phone_sequence: "haʊ ɑɹ ju ˈduɪŋ təˈdeɪ?"

We want to run this g2p conversion offline, then send the resulting phoneme sequence into the sherpa_onnx model, bypassing the internal G2P/tokenizer logic of piper_phonemizer. We know that we can create a lexicon.txt and use it if we dont use the espeak-ng folder for IPA generation on the fly as given in https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/offline-tts.py

We are checking this code snippet and trying to understand if we can input the phone sequence instead of text in tts.generate Please suggest us what modifications we need to do in the code:

_tts_config = sherpa_onnx.OfflineTtsConfig(
    model=sherpa_onnx.OfflineTtsModelConfig(
        vits=sherpa_onnx.OfflineTtsVitsModelConfig(
            model=args.vits_model,
            lexicon=args.vits_lexicon,
            data_dir=args.vits_data_dir,
            dict_dir=args.vits_dict_dir,
            tokens=args.vits_tokens,
        ),

tts = sherpa_onnx.OfflineTts(tts_config) audio = tts.generate(args.text, sid=args.sid, speed=args.speed)_

May 29 '25 05:05 anirpipi

sherpa-onnx sherpa-onnx copied to clipboard

Support for Offline Phonemization Input in Sherpa-ONNX VITS TTS

sherpa-onnx
sherpa-onnx copied to clipboard