[G2pPack + Phonetic Assistant] Give same phonetic result for uppercase and lowercase graphemes
Currently, in the Phonetic Assistant as well as the DiffSinger G2P phonemizers, uppercase graphemes get a different phonetic result when compared to lowercase graphemes. This is inconvenient since the end user may sometimes capitalize words, and sometimes not. If the end user wants to use a different pronunciation, they can use number suffixes, e.g. the(1).
In theory, this issue could affect any G2P-powered function (such as phonemizers), but in practice it currently only affects the Phonetic Assistant as well as the DiffSinger G2P phonemizers.
What this PR does NOT do
- Affect
SPandAP(this has been tested). If they are defined in the dictionary, or the dictionary contains no graphemes, they will work normally. (Note that they have to be defined in their uppercase form in the dsdict if there are any conflicting graphemes (e.g. lowercasespand/orap) ; however, this is currently the case as well). - Related to the above, but any capitalized graphemes that are manually defined in a custom dsdict (e.g.
KAvs.ka) will not be affected either, so you can still distinguish by capitalization manually if so desired. - Affect phonemes. This affects G2P graphemes (e,g, words) only.
It's not always correct to do this. Acronyms like CIA should be pronounced differently.
It works like this in the classic phonemizers as well, so I wanted it to work the same across the board. Perhaps I could ignore all-caps instances though.
That should be a decision per phonemizer. If it's a Japanese one that all ka, KA, Ka should be treated the same, sure. For English uppercase and lowercase shouldn't be treated the same.