TransformerTTS
TransformerTTS copied to clipboard
No numbers in phonemes set and collapse of whitespaces
When using phonemizer (espeak-ng) there are digits to reflex the vowel/sound variants like the following:
text = 'Có lối ra, chúng ta qua đó xem sao.'
phonemizer.phonemize(
text,
language='vi',
backend='espeak',
strip=False,
preserve_punctuation=True,
punctuation_marks=';:,.!?¡¿—…"«»“”',
with_stress=True,
language_switch='keep-flags',
njobs=1
)
output:
'ɡˈɔɜ lˈoɪɜ zˈaː7 , tɕˈuɜŋ t̪ˈaː1 wˈaː1 ɗˈɔɜ sˈɛ1m ʂˈaːʊ7 .'
with tokenizer._postprocess
:
text = ''.join([c for c in text if c in all_phonemes]) # --> will remove numbers which are not in phonemes set
text = _collapse_whitespace(text)
output:
ɡˈɔɜ lˈoɪɜ zˈaː,tɕˈuɜŋ tˈaː wˈaː ɗˈɔɜ sˈɛm ʂˈaːʊ.
Outputs placed together:
ɡˈɔɜ lˈoɪɜ zˈaː7 , tɕˈuɜŋ t̪ˈaː1 wˈaː1 ɗˈɔɜ sˈɛ1m ʂˈaːʊ7 .'
ɡˈɔɜ lˈoɪɜ zˈaː,tɕˈuɜŋ tˈaː wˈaː ɗˈɔɜ sˈɛm ʂˈaːʊ.
My question is the missing of numbers (here 7, 1) and spaces surround punctuation like comma as in zˈaː,tɕˈuɜŋ tˈaː
instead of zˈaː7 , tɕˈuɜŋ t̪ˈaː1
will affect the aligment and pause beetween generated words?
Hi,
the whitespace collapse is a wanted effect, mostly to be able to control where the pauses are allocated with the forward model. You can remove this if you want by removing it from line 91 in data/text/tokenizer.py (return the line above). But I would discourage that, unless you're running into problems.
For the numbers issue, you can add the missing phonemes (for instance 1,2,3,4,5,,6,7,8,9,0) in data/text/symbols.py in all phonemes like so:
all_phonemes = sorted(list(_phonemes) + list(_punctuations) + list('1234567890')
I was not aware that some languages had numbers as phonemes.
TODO: Add optional extra phonemes string to data_config.yaml
Thank you for your clarification and making phonemes configurable is super helpful. I'll try your suggestion.