Thorsten-Voice icon indicating copy to clipboard operation
Thorsten-Voice copied to clipboard

Abbreviation (bzgl.) pronounced wrong (but works with espeak-ng)

Open GithubAnon0000 opened this issue 1 year ago • 3 comments

Hello again!

I am using piper with the thorsten (high) voice. I wanted to see if it's possible to pronounce "bzgl." correctly without having to use a separate string that says "bezüglich". But it always speaks a long pause with your voice, where with espeak it works fine.

Maybe you've got an idea?

Steps to reproduce

  1. Edit the espeak dictionary by adding the following into de_extra file: bzgl b@ts'y:klIC $dot
  2. Compile the dictionary and copy it to the .dict file that piper uses: sudo espeak-ng --compile=de && cp /usr/lib/x86_64-linux-gnu/espeak-ng-data/de_dict ../TTS/espeak-ng-data/de_dict
  3. Use echo "Ich habe Fragen bzgl. Ihrer Rückmeldung." | ./piper --model ./de_DE-thorsten-high.onnx --output-file ../OUTPUT/text.wav for the audio generated with your voice model.
  4. Use espeak-ng "Ich habe Fragen bzgl. Ihrer Rückmeldung." -v German --stdout > ../OUTPUT/text_espeak.wav to generate the same audio with espeak.
  5. Compare the results: OUTPUT.zip

The voice obviously is different but so is the pronounciation. A workaround is to just use "bezüglich" instead of "bzgl.".

Expected Behavior

The pause after "bzgl." shouldn't be there.

Actual behavior

The pause is there.

Other things tried

According to espeak dictionary docs I tried the following alternatives one by one:

bzgl	b@ts'y:klIC $dot
bzgl	b@ts'y:klIC $hasdot
bzgl	bezüglich $text $dot
bzgl	bezüglich $text $hasdot

None where successfull though with the thorsten voice. Adding a dot after bzgl made it worse, even in espeak:

bzgl.	b@ts'y:klIC $dot
bzgl.	b@ts'y:klIC $hasdot
bzgl.	bezüglich $text $dot
bzgl.	bezüglich $text $hasdot

Version info

piper: 1.2.0 OS: Debian oldstable (gnome 3.38.5, X11) python: 3.9.2

GithubAnon0000 avatar Aug 21 '24 16:08 GithubAnon0000

I'll have to learn more about how the model had been trained (and how piper uses the model), since I came to the conclusion that the model itself is somehow doing that.

It happens with normal words and sentences too. The same sentence is not pronounced the same way, even though espeaks dictionaries are quit deterministic. Running piper with --debug actually shows the phonemes (just like espeak-ng --ipa). They are identical.

Judging on that, the model probably has some sort of variance for some reason. I'll have to learn more about it first but I believe the way the model had been trained has something to do with it, since AI tends to do things like that (and you trained it using coqui). Maybe it's more or less easily fixable (since I'd prefer deterministic output if possible). It's low priority for me though.

GithubAnon0000 avatar Sep 02 '24 02:09 GithubAnon0000

First of all thank you for your great and detailed description 👍.

One idea might be to clean the text before tts processing using e.g. https://github.com/repodiac/german_transliterate . Is this $dot at the end of the adjusted dictionary required? Maybe that's a reason for the break, which is meant to be after a dot character.

I tried your sentence on my huggingface spaces. Piper space: Ich habe Fragen bezüglich. Ihrer Rückmeldung. has a break after bezüglich.. Ich habe Fragen bezüglich. Ihrer Rückmeldung. is sounding good, without a break as the espeak speech flow.

My trained Coqui models have (as expected that break) too when a dot after bezüglich. is added.

So i'm not sure if that $dot at the end of the adjusted dictionary has something to do with that.

thorstenMueller avatar Sep 03 '24 18:09 thorstenMueller

My trained Coqui models have (as expected that break) too when a dot after bezüglich. is added.

Yes, but they shouldn't. At least if you use the actual abbreviation like outlined in the "steps to reprocude" parts. → Not "…bezüglich. …", but "… bzgl. …".

Is this $dot at the end of the adjusted dictionary required? Maybe that's a reason for the break, which is meant to be after a dot character.

The $dot basically says that the word "bzgl." has a dot but isn't supposed to be spoken with a break after that dot. It works fine with espeak, but not with piper and your model. I'm now guessing that the training (with ai) never learned about abbreviations and thus always assumes it should read a break after a dot (which in case of "bzgl.", it shouldn't).

One idea might be to clean the text before tts processing

Yes, that's what I'm currently doing (although with my own bash script). It works, since all I have to do is changing abbreviations like "bzgl.", "z. B." ect. to their long form ("bezüglich", "zum Beispiel"). Since this works, this issue is low priority for me as stated above. But if I could adjust the model or dictionary files someone so that preprocessing becomes redundant, this would be great.

Thanks for your time and looking into it!

GithubAnon0000 avatar Sep 04 '24 11:09 GithubAnon0000