PowerToys icon indicating copy to clipboard operation
PowerToys copied to clipboard

[Quick Accent] Add Middle Eastern Romanization

Open PesBandi opened this issue 11 months ago • 14 comments

Summary of the Pull Request

Not ready for merging Adds characters used for middle eastern romanization

PR Checklist

  • [x] Closes: #31572
  • [x] Communication: I've discussed this with core contributors already. If work hasn't been agreed, this work might be rejected
  • [x] Tests: Added/updated and all pass
  • [x] Localization: All end user facing strings can be localized
  • [x] Dev docs: No need
  • [x] New binaries: No new binaries
  • [x] Documentation updated: No need

Validation Steps Performed

Observing

PesBandi avatar Mar 13 '24 18:03 PesBandi

Hello, the PR is not ready yet, but I have some questions I hope might get answered.

  1. I'm not sure about the naming. Right now I am using ME in the code and "Middle Eastern Romanization" for the user-facing string, but I'd appreciate some feedback on that.
  2. The issue requests ı, İ, ı̇̄, İ̄. The problem with the first two letters is that there will always be an uppercase letter (İ) among the lowercase letters and a lowercase letter (ı) with the uppercase ones (when holding Shift/CapsLock). I've been considering making it so that "ı" will be capitalized into "İ". I know that "İ" is not the uppercase version of the dotless i, but to me it somewhat makes sense. Although it might seem unintuitive, I don't think it will impact user experience, since the user would press Shift anyway if they wanted an uppercase letter and vice versa. I believe that most people wouldn't wonder where the dotless i disappears when they are holding Shift or have CapsLock on. I'd like to hear your thoughts on this.

PesBandi avatar Mar 14 '24 20:03 PesBandi

Not sure about the name here 🤔 @ethanfangg , any best practices here from other Microsoft products? Would "Middle East Romanization" be a correct name for this?

jaimecbernardo avatar Mar 27 '24 13:03 jaimecbernardo

@PesBandi what languages are included in this set for romanization?

ethanfangg avatar Apr 15 '24 16:04 ethanfangg

https://en.wikipedia.org/wiki/ISO_233

Seems like we could use "ISO 233". Else I think we have multiple options listing the affected languages individually?

ethanfangg avatar Apr 15 '24 16:04 ethanfangg

@ethanfangg

  • "ISO 233" won't quite work because this set Romanizes not just Arabic and Persian (ISO 233-2:1993 and ISO 233-3:1999) but also Armenian, Hebrew, Syriac, Ottoman Turkish, and possibly other Middle Eastern languages, if they ended up being covered inadvertently.
  • Each of these languages has three to ten Romanization standards. There is usually such a high degree of (messy) overlap between languages and standards that it makes sense to group them all together under something like "Middle Eastern Romanization." (To get a sense of the messiness and overlap, you can look at the tables at https://en.wikipedia.org/wiki/Romanization_of_Armenian#Transliteration_tables and https://en.wikipedia.org/wiki/Romanization_of_Persian#Main_romanization_schemes.)

ohaniandaniel avatar Apr 15 '24 18:04 ohaniandaniel

@jaimecbernardo @PesBandi For naming, this is a bit of uncharted territory, but chatting with some globalization folks, I think guidance is that for the applicable languages that might benefit from these additional combinations, we add the new functionality to those sets and if there are additional languages, create new specific languages (for the list of options in Quick Accent) rather than trying to specify a blanket identifier?

ethanfangg avatar Apr 24 '24 16:04 ethanfangg

@ethanfangg I don't think creating a character set for every language is feasible due to the sheer number of languages. There would also be the difficulty of determining which characters are used by which language. There's a reason why they are all grouped under ISO 233.

Speaking of ISO 233, it is not a good name for several reasons besides those already mentioned. One major issue is that I can't verify if the characters in this PR are really the ones from ISO 233, as the current version isn't publicly available. Many people also don't know what ISO 233 is.

I understand why you are against creating a blanket identifier, I also don't like the idea, however this is a very broad group of characters. I can't tell for sure if it's ISO 233 and it also isn't any particular language. No one can really tell what it is, other than characters used for romanization of Middle Eastern languages, so I think calling it something like that is the only option.

PesBandi avatar May 27 '24 18:05 PesBandi

Hey @PesBandi & @jaimecbernardo. My thought here is the following:

Those with Quick Accent enabled fall into one of two categories:

  1. For character set, they show "All available"
  2. For character set, they've chosen a specific character set (e.g. "Turkish" - Note: PowerToys only has 34 total sets listed)

I propose either of the following solutions.

Solution 1 Given that, for the end user who does want the "middle eastern romanization" set, if they have "All available" chosen, their needs are met - we really need to only think about the users who only want to select a specific character set (e.g. "Turkish"). That said, I posit that for the current languages in our set of 34 languages, we should just add the "middle eastern romanization" set to each of the language sets that use those characters (i.e. Turkish and Hebrew and any other languages affected). Then, for example, if a user has for Turkish as their selected language, they would now see the "middle eastern romanization characters". If a user has selected "All available" then, obviously, they too would see the "middle eastern romanization characters" (we would need to make sure it doesn't show the same character multiple times for a given options list, if that's an issue).

Solution 2 Introduce a brand new, 35th character set that we call [INSERT APPROPRIATE NAME] that contains all of the "middle eastern romanization" characters. Like Solution 1, if a user has selected "All available" then they would see the "middle eastern romanization characters", and if a user selects the " [INSERT APPROPRIATE NAME] " set, then, again they would see the "middle eastern romanization characters", however, users selecting languages like "Turkish" would not get these characters. This feels wrong, but may be more simple from an implementation perspective (IF we can find a good name).

Of course, in either solution, we run into the issue of certain languages simply not existing in the PowerToys character set (i.e. Arabic, Persian/Farsi, etc.) but I think those are problems that could be identified via folks creating issues and then solved in future PRs. If there is a specific language that doesn't exist that you would want to see added, then I suggest you do the work to create the set for just that language.

I personally prefer solution 1, because although we don't have every language in our current set of 34 languages listed, it's more true to the expectation of selecting an actual language from the drop down rather than a set of characters.

ethanfangg avatar Jun 21 '24 01:06 ethanfangg

Hello @ethanfangg, thank you for your response. Here are some things I want to point out.

Solution 1 There are currently only three Middle Eastern languages in Quick Accent:

  • Turkish: uses Latin characters
  • Kurdish: The characters in Quick Accent are the ones used for romanization
  • Hebrew: The Quick Accent set are the characters used for writing Hebrew, not the romanization ones

Therefore, Solution 1 would mean just adding the characters to Hebrew, so it would be a mix of the Latin characters used for transliteration and the Hebrew script characters.

Solution 2 First issue: can't decide on an appropriate name. Second issue: Kurdish already has its romanization characters, meaning there would be one set for romanization characters specific to Kurdish and one for all Middle Eastern languages.

I don't know what the original issue was trying to achieve, so I think let's ask @ohaniandaniel.

PesBandi avatar Jun 22 '24 09:06 PesBandi

@PesBandi, @ethanfangg , having read through all of your comments, I find Ethan's solution 2 to be the best. I suggest calling the character set "Middle Eastern Romanization."

ohaniandaniel avatar Jun 22 '24 09:06 ohaniandaniel

have we reached a consensus here about the way forward? @ethanfangg are you ok with Solution 2 - "new character set callsed "Middle Eastern Romanization." ?

stefansjfw avatar Aug 12 '24 09:08 stefansjfw

Hi, @ethanfangg. I'm the person who filed the original request that led to the creation of this pull request. Is there anything I can do to help move this along? It seems like we're all on the same page now.

ohaniandaniel avatar Sep 08 '24 18:09 ohaniandaniel

Thanks for the bump @ohaniandaniel

@stefansjfw I'm fine with calling it Middle Eastern Romanization for now - we may need to eventually split into individual languages if that's a more appropriate solution, but I don't think it is worth continuing to be blocked on naming based on internal feedback.

ethanfangg avatar Sep 10 '24 18:09 ethanfangg

@jaimecbernardo fyi on updates here

ethanfangg avatar Sep 11 '24 16:09 ethanfangg

Hi @PesBandi , I've taken a look at the PR again and feel like it's ready to go in. I wanted to change some internals and make it more about the "Romanization" part of "Middle Eastern Romanization", since in the future we might end up adding more "Romanization" characters in here. I merged latest main in and changed internal strings like "ME" to "ROM". https://github.com/microsoft/PowerToys/pull/31905/commits/86732a8517bfacd860b912603e9d42f0940916f5

Hope you're OK with these changes.

jaimecbernardo avatar Sep 23 '24 16:09 jaimecbernardo

Hi, thanks for the review, I'm glad that it's moving closer to getting merged. I totally agree with your changes.

PesBandi avatar Sep 24 '24 06:09 PesBandi