piper icon indicating copy to clipboard operation
piper copied to clipboard

Piper as Library?

Open tim-gromeyer opened this issue 2 years ago • 12 comments
trafficstars

Can Piper be used as a library directly in an application, or is it necessary to create a wrapper for the Piper executable?

tim-gromeyer avatar Oct 01 '23 19:10 tim-gromeyer

TLDR; There is a python module for piper which I haven't found any documentation for, so I've written a simple 1 script wrapper for it, but there is also a more feature filled library called dimits which is a bit buggy which I did not make (https://pypi.org/project/dimits/).

Hello, I assume by this issue that you want to use the text to speech of piper in a python script. There is indeed already an official python module for piper, but I haven't been able to find any documentation for it. There are two ways I think you could implement piper's text to speech using this library:

  1. Using the dimits library https://pypi.org/project/dimits/ (I didn't make this.)
  2. By using a simple script I made

If you want to use the dimits library I recommend going to the PyPi page and reading through the guide there, https://pypi.org/project/dimits/.

If you want to just pick and choose the functions you need from the wrapper, or want to import it as a module, then you first to need to install the PyPi library for the official piper library and pygame to play the audio files generated cross-platform pip install piper-tts pygame, then choose a voice from https://rhasspy.github.io/piper-samples/ and download it. The script is below:

from piper.voice import PiperVoice as piper #Backbone of text to speech
import wave #Writing text to speech to wave files
from sys import platform
from os import remove
from os import environ

environ['PYGAME_HIDE_SUPPORT_PROMPT'] = "hide" #Stop pygame from saying hello in the console

from pygame import mixer

mixer.init()

#Check if the operating system is MacOS
if platform == 'darwin':
    print('This library cannot be used on MacOS yet, due to piper not being supported there.')
    exit() #Piper isn't available on MacOS yet

model = None #Set the model to none at the start
voice = None #Set the voice to none at the start

#Function to load and set the voice model to be used
def load(model_set):
    global model
    global voice
    
    if '.onnx' in model_set: #Is the extension .onnx in the filename, if not add it below.
        model = model_set #Set model variable as is.
    else:
        model = model_set + '.onnx' #Set model variable and append .onnx.
    
    #Try to load the model
    try:
        voice = piper.load(model) #Load the model
    except:
        print("Something went wrong, did you type the correct name for the model?")
        exit()

#Function to save a text to speech file to disk
def save(text, file_name, model_set=model):
    global model
    global voice
    
    if model_set == None and model == None: #Check if a model has been set, if not then exit.
        print("No model was set! Please set a model using the load function: load(\"your_model_here\")")
        exit()
    elif model_set != None: #Is there a voice that should be used only once?
        temp_voice = piper.load(model_set)
        with wave.open(file_name, "wb") as wav_file:
            temp_voice.synthesize(text, wav_file)
    else: #If not, use the voice that was set eariler.
        with wave.open(file_name, "wb") as wav_file:
            voice.synthesize(text, wav_file)

#Function to save the file to disk, play it on the speakers, then delete the file.
def say(text, model_set=model):
    global model
    global voice
    
    if model_set == None and model == None: #Check if a model has been set, if not then exit.
        print("No model was set! Please set a model using the load function:  load(\"your_model_here\")")
        exit()
    elif model_set != None: #Is there a voice that should only be used once?
        temp_voice = piper.load(model_set)
        with wave.open('tmp_text_2_speech.wav', "wb") as wav_file:
            temp_voice.synthesize(text, wav_file)
    else: #If not, use the voice that was set earlier.
        with wave.open('tmp_text_2_speech.wav', "wb") as wav_file:
            voice.synthesize(text, wav_file)
    
    #Play the file
    mixer.music.load('tmp_text_2_speech.wav')
    mixer.music.set_volume(1)
    mixer.music.play()

    while mixer.music.get_busy():
        pass
    
    remove("tmp_text_2_speech.wav") #Remove the temporary file

A simple example is of using this script is:

import whatever_you_named_the_script as engine
engine.load('ModelNameHere')
engine.say('I want to eat ice cream after this.')

Good luck, and if I was wildly off topic and wrong in my answer please let me know, and I'll try to help.

mario872 avatar Oct 07 '23 10:10 mario872

Nice, thanks for the script. I was looking for a c++ library but maybe looking at the source of piper-tts helps.... but thanks anyway!

tim-gromeyer avatar Oct 07 '23 11:10 tim-gromeyer

Okay, I only know a little bit about C++, but I believe that you will be able to either use or take inspiration from the src/cpp/ directory in this repo. main.ccp seems to also have a command line usage that you could pull apart. Good luck!

mario872 avatar Oct 07 '23 19:10 mario872

This project may be of interest: https://voicedock.app/apps/ttspiper/ - wraps Piper with a gRPC API (I haven't used it myself)

porjo avatar Nov 05 '23 02:11 porjo

Any progress on this? I am interested cause for now I'm using system but I'd want to avoid it :S

WilsonRoblesTafco avatar Nov 10 '23 09:11 WilsonRoblesTafco

Maybe you find the project https://github.com/k2-fsa/sherpa-onnx interesting.

It supports all vits models from piper and we have converted all of the English/French/German/Spanish models from piper to sherpa-onnx. You can try them by visiting the following huggingface space https://huggingface.co/spaces/k2-fsa/text-to-speech

There are also pre-built Android APKs for it. Please see https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

If you want to use sherpa-onnx in your own application, we provide APIs for C++/C/Python/Go/Swift/Kotlin/C#, etc. You can build sherpa-onnx either as a static or a dynamic library. It supports linux/macos/windows.

For instance, you can follow https://k2-fsa.github.io/sherpa/onnx/c-api/index.html to use the C API of sherpa-onnx.

https://github.com/k2-fsa/sherpa-onnx/blob/master/c-api-examples/offline-tts-c-api.c is an example using the C API of sherpa-onnx for TTS.

csukuangfj avatar Nov 10 '23 09:11 csukuangfj

If you want a minimal version of @mario872 's answer:

from piper.voice import PiperVoice as piper
import wave
voice = piper.load("en_US-lessac-medium.onnx")
with wave.open("test.wav", "wb") as wav_file:
    voice.synthesize("This is a test", wav_file)

thiswillbeyourgithub avatar May 07 '24 13:05 thiswillbeyourgithub

The next version of Piper will be usable via a C API and a Python API.

synesthesiam avatar May 08 '24 00:05 synesthesiam

Great to hear. Also is there a way currently via python to pass in multiple sentence? The readme says with a pipe it can be done using json format but I can't find how to do it in python

thiswillbeyourgithub avatar May 08 '24 05:05 thiswillbeyourgithub

@synesthesiam Neat, a C API would be incredibly useful, especially for game design.

Do you have an idea when the next version of Piper will be released?

psych0v0yager avatar Aug 26 '24 00:08 psych0v0yager

+1 for a C or C++ API

WilliamTambellini avatar Aug 26 '24 02:08 WilliamTambellini

+1

jabamaus avatar Aug 26 '24 06:08 jabamaus