[Feature request] Text for synthesis needs to be normalized for languages with diacritics

Open JanX2 opened this issue 2 years ago • 1 comments

Text for synthesis needs to be normalized for languages with diacritics or synthesis will be incorrect under certain ircumstances.

For diacritics, like German with its umlauts (äöü), there are often at least two ways to represent them in Unicode text: precomposed (a single code point: ä) and decomposed (a base code point modified by another: a + ¨). Some text sources, like piping a string into the tts command via xargs sourced from a text file may not convert from decomposed to precomposed. This is a problem, because the models I tested (i.e. "thorsten/tacotron2-DDC") only synthesize an umlaut in the precomposed form. They will just ignore the diacritics characters otherwise, synthesizing the base letter.

I’m not a Python dev. A hacky way of fixing this would be to modify "synthesize.py":

import unicodedata
…
args = parser.parse_args()
args.text = unicodedata.normalize('NFC', args.text)

Alternatively we could find some other way to make sure that the models are always supplied tokens that they can synthesize.

The text conversion could be optional via a command line argument.

Dec 04 '23 13:12 JanX2

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Jan 04 '24 08:01 stale[bot]