espeak-ng
espeak-ng copied to clipboard
Klatt syntesis too low on high frequencyes
As the title says, the high frequencies are too low. When using klatt6 this happens. I've been testing speech player on NVDA, the off-spec version, and I notice a big difference in sound, mainly on small speakers. I wanted to modify the klatt6 based variants, but it does not support volume parameter in formants. I think a good way to fix this is to listen to Espeak's frequency balance without any variant and compare with the klatt6 variant, which does not modify the formants.
Why do you think this is a problem?
I think it is a problem because it affects the intelligibility of speech in small speakers. Many screen reader users prefer klatt synthesis. They prefer it so much that they crack the Eloquence.
Ok, I get you there, but I'm struggling to understand what you mean with frequencies. You mean when you change the pitch? This is a general problem with ESpeak in NVDA, but when you have capital pitch change enabled, the pitch changes with formants as well. But please correct me if it's not what you mean.
I have perceived this problem. As the pitch increases, the formants increase together, causing the voice characteristics to change. But that does not interest me much, although it would be good if the voice retained its characteristics.
When I speak of frequencies I mean the formants. Like an equalizer that increases the volume of some frequencies and decreases the volume of others. This is precisely what I experience on the Klatt6, from NV Speech player. Tall formants are low in volume and cannot be changed in variant files. Only the pitch of each formant and the bandwidth can be changed.
Example:
formant 5 100 600 100
The volume of formant 5 is at 600, but there is no effect.
I have the impression that the signal has a low pass filter before it reaches the noise generator or the wave in cycles.
I have done some tests using NV speech player, specifically changing the sample rate from python code. As a result, the sound is heard with less volume in the high frequencies. The default frequency is 16000hz. If I decrease to 22025, the treble gains presence. If I change to 22050, they lose. I've done some testing using Nyquist filters in Audacity and a sawtooth wave and the same thing happens. The solution that I think works in espeak NG is to allow phonemes to be compiled at 16000 or 11025 hz.