StyleTTS2 icon indicating copy to clipboard operation
StyleTTS2 copied to clipboard

Added importable module

Open lxe opened this issue 2 years ago • 21 comments

Fixes #51

You need to provide your own phonemizer (because of this), and can use it like so:

from styletts2 import TTS
import sounddevice as sd
import phonemizer

tts = TTS.load_model(
    config_path="hf://yl4579/StyleTTS2-LibriTTS/Models/LibriTTS/config.yml",
    checkpoint_path="hf://yl4579/StyleTTS2-LibriTTS/Models/LibriTTS/epochs_2nd_00020.pth"
)

es_phonemizer = phonemizer.backend.EspeakBackend(
    language='en-us',
    preserve_punctuation=True,
    with_stress=True
)

style = tts.compute_style('../tts-server/tts_server/voices/en-f-1.wav')

wav, _ = tts.inference(
    "This is a text! Hello world! How are you? What's your name?", 
    style,
    phonemizer=es_phonemizer,
    alpha=0.3,
    beta=0.7,
    diffusion_steps=10,
    embedding_scale=2)

sd.play(wav, 24000)
sd.wait()

lxe avatar Nov 26 '23 23:11 lxe

See https://github.com/yl4579/StyleTTS2/pull/78#issuecomment-1826117745, the same problem of GPL license.

yl4579 avatar Nov 27 '23 00:11 yl4579

Phonemizer was already included in the project. I can remove phonemizer dependency and just allow people to pass their own phonemizers.

lxe avatar Nov 27 '23 00:11 lxe

Ah your usage of phonemizer is "only to run the demo":

https://github.com/yl4579/StyleTTS2/blob/17c6b6120ca99b193ed500fa8c6dc1820edccff8/README.md?plain=1#L39

Which I guess makes sense in this case :)

lxe avatar Nov 27 '23 00:11 lxe

Also I tried using @fakerybakery 's idea of using DeepPhonemizer, but it's not nearly as good as espeak

lxe avatar Nov 27 '23 00:11 lxe

I changed it so a phonemizer needs to be explicitly loaded

wav, _ = tts.inference(
    "This is a text! Hello world! How are you? What's your name?", 
    style,
    phonemizer=es_phonemizer,
    alpha=0.3,
    beta=0.7,
    diffusion_steps=10,
    embedding_scale=2)

lxe avatar Nov 27 '23 01:11 lxe

Hi @lxe, my fork supports importing. I think the author @yl4579 mentioned it would be better to keep a separate GPL'd fork.

https://github.com/NeuralVox/StyleTTS2

I will try to keep it updated with the main repo

fakerybakery avatar Nov 27 '23 01:11 fakerybakery

@fakerybakery @lxe Have you checked https://github.com/lingjzhu/CharsiuG2P?

yl4579 avatar Nov 27 '23 01:11 yl4579

Hmm! Looks interesting. Basically a T5 model trained on phonemes. I'll try it out in the upcoming days

fakerybakery avatar Nov 27 '23 02:11 fakerybakery

Seems like on the tiny model there are some issues, I'll try out the larger models later. Input Text: Hello world! CharsiuG2P: hɛlowoɐ̯ldˈeslo Phonemizer: həloʊ wɜːld

fakerybakery avatar Nov 27 '23 02:11 fakerybakery

Yup I've been checking Charsui and Text2PhonemeSequence

They don't do well with stress and have other artifacts

lxe avatar Nov 27 '23 02:11 lxe

Opportunity for a new open source project: phonemizer alternative that supports many languages and is compatible with espeak!

fakerybakery avatar Nov 27 '23 02:11 fakerybakery

Coqui ships MPL2.0 / commercial product, but using espeak-ng like this ?

lxe avatar Nov 27 '23 02:11 lxe

Yeah, they're probably violating the license (IANAL). Does anyone know C well to reverse engineer espeak?

fakerybakery avatar Nov 27 '23 02:11 fakerybakery

Gruut is a bust too. It over-stresses things and isn't nearly as accurate as espeak

lxe avatar Nov 27 '23 02:11 lxe

Sort of funny. MPL is compatible with GPL but not the other way around.

fakerybakery avatar Nov 27 '23 02:11 fakerybakery

Yeah. But training a T5 model on phonemizer doesn't seem too hard though. You just get a text dataset in that language, phonemize it using phonemizer, and train the model. The main thing is that it's expensive. @yl4579 if a multilingual phonemizer dataset were available would the compute you have access to be enough to train a phonemizer T5 model?

fakerybakery avatar Nov 27 '23 02:11 fakerybakery

The way Coqui TTS does it is by expecting an espeak-ng binary to be available. This actually doesn't seem to violate GPL.

lxe avatar Nov 27 '23 02:11 lxe

Hmm, does phonemizer do the same thing? Also, we could always write a script to start a phonemizer server on localhost and have it call the API

fakerybakery avatar Nov 27 '23 02:11 fakerybakery

Relevant discussions:

https://github.com/rhasspy/piper/issues/93 https://github.com/espeak-ng/espeak-ng/issues/908

lxe avatar Nov 29 '23 21:11 lxe

If there is a decent enough or sometimes usable phonemizer alternative, I can integrate it into my TTS web ui. Since I do full install scripts, the install phonemizer yourself approach is not really viable.

rsxdalv avatar Jan 16 '24 20:01 rsxdalv

Use gruut- see the styletts2 pip package on PyPI

fakerybakery avatar Jan 16 '24 20:01 fakerybakery