OpenUtau icon indicating copy to clipboard operation
OpenUtau copied to clipboard

[G2pPack + Phonetic Assistant] Give same phonetic result for uppercase and lowercase graphemes

Open lottev1991 opened this issue 1 year ago • 3 comments

Currently, in the Phonetic Assistant as well as the DiffSinger G2P phonemizers, uppercase graphemes get a different phonetic result when compared to lowercase graphemes. This is inconvenient since the end user may sometimes capitalize words, and sometimes not. If the end user wants to use a different pronunciation, they can use number suffixes, e.g. the(1).

In theory, this issue could affect any G2P-powered function (such as phonemizers), but in practice it currently only affects the Phonetic Assistant as well as the DiffSinger G2P phonemizers.

What this PR does NOT do

  • Affect SP and AP (this has been tested). If they are defined in the dictionary, or the dictionary contains no graphemes, they will work normally. (Note that they have to be defined in their uppercase form in the dsdict if there are any conflicting graphemes (e.g. lowercase sp and/or ap) ; however, this is currently the case as well).
  • Related to the above, but any capitalized graphemes that are manually defined in a custom dsdict (e.g. KA vs. ka) will not be affected either, so you can still distinguish by capitalization manually if so desired.
  • Affect phonemes. This affects G2P graphemes (e,g, words) only.

lottev1991 avatar Jul 08 '24 02:07 lottev1991

It's not always correct to do this. Acronyms like CIA should be pronounced differently.

stakira avatar Sep 01 '24 20:09 stakira

It works like this in the classic phonemizers as well, so I wanted it to work the same across the board. Perhaps I could ignore all-caps instances though.

lottev1991 avatar Sep 01 '24 22:09 lottev1991

That should be a decision per phonemizer. If it's a Japanese one that all ka, KA, Ka should be treated the same, sure. For English uppercase and lowercase shouldn't be treated the same.

stakira avatar Sep 03 '24 00:09 stakira