espeak-ng icon indicating copy to clipboard operation
espeak-ng copied to clipboard

Brazilian Portuguese MBrola rules need adjustments

Open Cleversn opened this issue 4 years ago • 5 comments

Around six years ago, I pointed a problem to Reece Dunn, and he identified the cause of the problem, but we haven't worked on a solution at that time. Below I am pasting a message I wrote to him describing the problem, then his answer to me indicating what is the real cause. I'm attaching the necessary files as well.

Cleverson writes: Hi Reece, I'll try to describe the eSpeak + MBrola issue as I said yesterday. I think it has to do with phonemes' duration, maybe you can sort it out.

Here in Brazil, there are some projects which make use of MBrola, such as a screen reader called Virtual Vision, which was popular in Brazil by the end of the 90s. I got used to that voice, which was quite pleasant, actually the br1 MBrola voice. ESpeak supports it, as well as the br2/3/4 voices, but they all don't sound pleasant when used with eSpeak.

In order to demonstrate this, I've generated six wave files, in which a piece of text in Portuguese is spoken using br1, for you to listen and try identifying what I mean. Three of them were generated using the phoplayer tool that comes in the MBrola package, using different speeds. All of the three sound regular and flat, i.e., all consonants and vowels have similar durations, which makes the text not only understandable but cool to hear, at least for me.

Then I generated three files in different speeds using the eSpeak MBrola voices, by commanding eSpeak to read the same piece of text. The sounding is irregular, i.e., it appears that some vowels are too short and others too long, and same for some consonants, which makes some sylables or even words to sound too fast, followed by others too slow. This is more noticeable in high speeds.

Do you think you could manage to make the eSpeak experience come closer to that of phoplayer in this regard?

I'm attaching the six wave files plus the Portuguese text file for you.

Reece answers: audios-and-text.zip The problem is due to the espeak pronunciation rules interacting with the br1 voice. MBROLA voices work on what are called diphones. These are pairs of individual sounds (phones) for the given language. Thus, the letter 'M' is pronounced 'eme' and is split into the diphones '-e e-m m-e e-', where the underscore represents silence/pauses. Each voice has audio for each diphone that is used to reconstruct the audio, or has that diphone marked as unknown, in which case MBROLA will insert silence. The issue that you are getting with eSpeak is the pronunciation hitting these unknown diphones. For example, in the case of the word 'MBrola', the espeak pt-br voice uses 'eme', but mb-br1 uses 'm_', resulting in the Warning: m-_ unkown, replaced with _-_ message from mbrola. Thus, what you hear is '_-m _-b'; that is the start of an 'm' sound with the second half of it clipped, followed by a 'b' sound. This is why the voices don't sound pleasant. In this case, commenting out line 734 of dictsource/pt_rules will make the mbrola voice use 'eme' correctly. There are other similar errors. These will either require adjustments to the pt_rules or the phsource/mbrola/ptbr file. The different mbrola voices support different diphones, so getting the balance right can be tricky. Thanks, Reece

Cleversn avatar Dec 11 '21 19:12 Cleversn

To fix missing phoneme transitions look at MBROLA voices, especially Add MBROLA phoneme translation file.

valdisvi avatar Dec 14 '21 16:12 valdisvi

This issue still exists today. Could anyone more immersed in the MBrola structure start fixing this? I would help to test the results and even tweak the voice. Problem is that I'm not familiarized enough with the integration of MBrola + ESpeak-ng, so I'd probably mess things more than fix them.

Cleversn avatar Oct 09 '25 21:10 Cleversn

Hi @Cleversn!

I created a pull request a few days ago. I'm trying to finish the PR this weekend.

If you're interested, there are two links to YouTube videos in the PR. The second link is the voice of M-Brola BR1 reading a newspaper article. I think this voice is better than before, but I still need to make some adjustments.

This is the PR: https://github.com/espeak-ng/espeak-ng/pull/2297

Fábio.

fabiolimace avatar Oct 11 '25 07:10 fabiolimace

I just posted a 15 minutes video on Youtube that shows all the Brazilian voices reading a sample of the book of Genesis. This is the link of the new video: https://www.youtube.com/watch?v=5eFqZErezx0

fabiolimace avatar Oct 11 '25 08:10 fabiolimace

Hi Fábio, thanks a lot! Particularly in Linux, this will be the best synthesiser.

Cleversn avatar Oct 11 '25 12:10 Cleversn