piper Different tones on different parts of the text

It would be nice if it was possible to add some specific tones to single words or whole sentences.

For example:

emphasis
sarcasm
whispering

(also all combinations, like whispering sarcasm)

Is this already possible somehow? I saw espeak generates emphasis markers anyway, but maybe this could be altered manually in some way?

Or I could probably train and use different variants of some voice. But it seems it's not possible to switch voices without causing pauses, even when setting "sentence_silence" to 0. But this would probably be the best workaround so far.

It would still be nice if such a feature existed, preferably without the need of training new voices (if possible).

May 04 '24 04:05 porky11

I second this. Some sort of markup language would be nice if it exists.

May 19 '24 04:05 Daburnell112

Hi, I will work in a pitch conditioning model soon and maybe PR these additions with new updates from the piper side.

May 19 '24 13:05 rmcpantoja

@porky11 - how could sarcasm or whispering be applied to an output voice without training with relevant voice recordings?

Is there some proces you have in mind that could be applied to the audio to achieve this? My suspicion is that there isn't a viable way to do this (without the audio + training)

May 29 '24 00:05 nmstoker

In my experience, you need audio data for each case (sarcasm, whispering, etc.) and then a multi speaker model needs to be trained with each case being a different "speaker".

This is exactly what the Thorsten emotional voice does (German).

May 29 '24 04:05 synesthesiam