piper icon indicating copy to clipboard operation
piper copied to clipboard

SSML Support?

Open duplaja opened this issue 8 months ago • 3 comments

I apologize if I missed this somewhere in the documentation, but does the python install support SSML? I'm converting some of my old scripts over from Mimic3, and didn't see any flag like Mimic3's --ssml.

Alternatively, what's the best way to add a break between paragraphs, if SSML is not supported?

Thank you!

duplaja avatar Nov 19 '23 01:11 duplaja

This would be incredibly useful.

Any thoughts on when/whether this is in train?

nptrainor avatar Dec 31 '23 20:12 nptrainor

This is planned, but I haven't made any progress yet. Adding pauses and changing the playback speed is easy, but switching voices will require more changes.

synesthesiam avatar Jan 14 '24 05:01 synesthesiam

@synesthesiam Another vote for SSML here. I'm especially interested in emphasizing some words/phrases. FWIW I wrote a script that allows embedding of a voice name in the speech text string that will switch the voice. I am happy to share that if there's interest. My plan is to have a "Speech Center" up and running on ZeroMQ (or any messaging thing like MQTT, Rabbit, etc) and different scripts would be able to send text to be spoken, with embedded voice commands, to the speech center via messaging. So far - these voices are AWESOME and the script I'm using makes them incredibly simple to switch between. Thank you for making all these available.

DaveXanatos avatar May 08 '24 02:05 DaveXanatos

That would be great :) I've been looking forward to SSML and other supplements for a long time. I don't want to use any other TTS because I am satisfied. The SSMl, pause hold and custom tags would improve the experience a lot: [laughter], [laughs], [sighs]... like in Bark TTS. If you can first solve the pause with similar parameters, that would be great, e.g. [wait=2s] Thanks for your work!

fantnhu avatar May 20 '24 13:05 fantnhu

I'm finally making some progress on SSML. The next version of Piper should support breaks (pauses), word/phoneme substitutions, and some say-as forms (number, date, etc.).

I can't do laughter and sighs, unfortunately. Those would have had to be present in the original datasets.

synesthesiam avatar May 23 '24 01:05 synesthesiam

This is excellent news! I've been creating some form of emphasis by adding a slight bit of time to the --length_scale and --sentence_silence parameters, but pauses and say-as are very welcome additions!!! Thanks!

DaveXanatos avatar May 23 '24 02:05 DaveXanatos