pheme icon indicating copy to clipboard operation
pheme copied to clipboard

unique_text_tokens.k2symbols for non-english languages

Open paulovasconcellos-hotmart opened this issue 1 year ago • 1 comments

Hello everyone, I've noticed that throughout the pipeline, unknown tokens are removed, and that the unique_text_tokens.k2symbols doesn't contém all necessary phonemes for Non-English languages, such as accents and other diacritics.

I'm training to train pheme in Portuguese, and I was wondering what I should do so the model can understand the accents of my language. Any tips on how to do it?

P.S.: I've also changed the phonemizer backend, so it could generate phonemes in PT-BR. espeak is available in PT-BR, so it was a no-brainer.