Different tones on different parts of the text
It would be nice if it was possible to add some specific tones to single words or whole sentences.
For example:
- emphasis
- sarcasm
- whispering
(also all combinations, like whispering sarcasm)
Is this already possible somehow? I saw espeak generates emphasis markers anyway, but maybe this could be altered manually in some way?
Or I could probably train and use different variants of some voice. But it seems it's not possible to switch voices without causing pauses, even when setting "sentence_silence" to 0. But this would probably be the best workaround so far.
It would still be nice if such a feature existed, preferably without the need of training new voices (if possible).
I second this. Some sort of markup language would be nice if it exists.
Hi, I will work in a pitch conditioning model soon and maybe PR these additions with new updates from the piper side.
@porky11 - how could sarcasm or whispering be applied to an output voice without training with relevant voice recordings?
Is there some proces you have in mind that could be applied to the audio to achieve this? My suspicion is that there isn't a viable way to do this (without the audio + training)
In my experience, you need audio data for each case (sarcasm, whispering, etc.) and then a multi speaker model needs to be trained with each case being a different "speaker".
This is exactly what the Thorsten emotional voice does (German).