piper icon indicating copy to clipboard operation
piper copied to clipboard

Different tones on different parts of the text

Open porky11 opened this issue 1 year ago • 5 comments

It would be nice if it was possible to add some specific tones to single words or whole sentences.

For example:

  • emphasis
  • sarcasm
  • whispering

(also all combinations, like whispering sarcasm)

Is this already possible somehow? I saw espeak generates emphasis markers anyway, but maybe this could be altered manually in some way?

Or I could probably train and use different variants of some voice. But it seems it's not possible to switch voices without causing pauses, even when setting "sentence_silence" to 0. But this would probably be the best workaround so far.

It would still be nice if such a feature existed, preferably without the need of training new voices (if possible).

porky11 avatar May 04 '24 04:05 porky11

I second this. Some sort of markup language would be nice if it exists.

Daburnell112 avatar May 19 '24 04:05 Daburnell112

Hi, I will work in a pitch conditioning model soon and maybe PR these additions with new updates from the piper side.

rmcpantoja avatar May 19 '24 13:05 rmcpantoja

@porky11 - how could sarcasm or whispering be applied to an output voice without training with relevant voice recordings?

Is there some proces you have in mind that could be applied to the audio to achieve this? My suspicion is that there isn't a viable way to do this (without the audio + training)

nmstoker avatar May 29 '24 00:05 nmstoker

In my experience, you need audio data for each case (sarcasm, whispering, etc.) and then a multi speaker model needs to be trained with each case being a different "speaker".

This is exactly what the Thorsten emotional voice does (German).

synesthesiam avatar May 29 '24 04:05 synesthesiam