PowerToys
PowerToys copied to clipboard
[Quick Accent] Add Middle Eastern Romanization
Summary of the Pull Request
Not ready for merging Adds characters used for middle eastern romanization
PR Checklist
- [x] Closes: #31572
- [x] Communication: I've discussed this with core contributors already. If work hasn't been agreed, this work might be rejected
- [x] Tests: Added/updated and all pass
- [x] Localization: All end user facing strings can be localized
- [x] Dev docs: No need
- [x] New binaries: No new binaries
- [x] Documentation updated: No need
Validation Steps Performed
Observing
Hello, the PR is not ready yet, but I have some questions I hope might get answered.
- I'm not sure about the naming. Right now I am using
ME
in the code and "Middle Eastern Romanization" for the user-facing string, but I'd appreciate some feedback on that. - The issue requests ı, İ, ı̇̄, İ̄. The problem with the first two letters is that there will always be an uppercase letter (İ) among the lowercase letters and a lowercase letter (ı) with the uppercase ones (when holding Shift/CapsLock). I've been considering making it so that "ı" will be capitalized into "İ". I know that "İ" is not the uppercase version of the dotless i, but to me it somewhat makes sense. Although it might seem unintuitive, I don't think it will impact user experience, since the user would press Shift anyway if they wanted an uppercase letter and vice versa. I believe that most people wouldn't wonder where the dotless i disappears when they are holding Shift or have CapsLock on. I'd like to hear your thoughts on this.
Not sure about the name here 🤔 @ethanfangg , any best practices here from other Microsoft products? Would "Middle East Romanization" be a correct name for this?
@PesBandi what languages are included in this set for romanization?
https://en.wikipedia.org/wiki/ISO_233
Seems like we could use "ISO 233". Else I think we have multiple options listing the affected languages individually?
@ethanfangg
- "ISO 233" won't quite work because this set Romanizes not just Arabic and Persian (ISO 233-2:1993 and ISO 233-3:1999) but also Armenian, Hebrew, Syriac, Ottoman Turkish, and possibly other Middle Eastern languages, if they ended up being covered inadvertently.
- Each of these languages has three to ten Romanization standards. There is usually such a high degree of (messy) overlap between languages and standards that it makes sense to group them all together under something like "Middle Eastern Romanization." (To get a sense of the messiness and overlap, you can look at the tables at https://en.wikipedia.org/wiki/Romanization_of_Armenian#Transliteration_tables and https://en.wikipedia.org/wiki/Romanization_of_Persian#Main_romanization_schemes.)
@jaimecbernardo @PesBandi For naming, this is a bit of uncharted territory, but chatting with some globalization folks, I think guidance is that for the applicable languages that might benefit from these additional combinations, we add the new functionality to those sets and if there are additional languages, create new specific languages (for the list of options in Quick Accent) rather than trying to specify a blanket identifier?
@ethanfangg I don't think creating a character set for every language is feasible due to the sheer number of languages. There would also be the difficulty of determining which characters are used by which language. There's a reason why they are all grouped under ISO 233.
Speaking of ISO 233, it is not a good name for several reasons besides those already mentioned. One major issue is that I can't verify if the characters in this PR are really the ones from ISO 233, as the current version isn't publicly available. Many people also don't know what ISO 233 is.
I understand why you are against creating a blanket identifier, I also don't like the idea, however this is a very broad group of characters. I can't tell for sure if it's ISO 233 and it also isn't any particular language. No one can really tell what it is, other than characters used for romanization of Middle Eastern languages, so I think calling it something like that is the only option.
Hey @PesBandi & @jaimecbernardo. My thought here is the following:
Those with Quick Accent enabled fall into one of two categories:
- For character set, they show "All available"
- For character set, they've chosen a specific character set (e.g. "Turkish" - Note: PowerToys only has 34 total sets listed)
I propose either of the following solutions.
Solution 1 Given that, for the end user who does want the "middle eastern romanization" set, if they have "All available" chosen, their needs are met - we really need to only think about the users who only want to select a specific character set (e.g. "Turkish"). That said, I posit that for the current languages in our set of 34 languages, we should just add the "middle eastern romanization" set to each of the language sets that use those characters (i.e. Turkish and Hebrew and any other languages affected). Then, for example, if a user has for Turkish as their selected language, they would now see the "middle eastern romanization characters". If a user has selected "All available" then, obviously, they too would see the "middle eastern romanization characters" (we would need to make sure it doesn't show the same character multiple times for a given options list, if that's an issue).
Solution 2 Introduce a brand new, 35th character set that we call [INSERT APPROPRIATE NAME] that contains all of the "middle eastern romanization" characters. Like Solution 1, if a user has selected "All available" then they would see the "middle eastern romanization characters", and if a user selects the " [INSERT APPROPRIATE NAME] " set, then, again they would see the "middle eastern romanization characters", however, users selecting languages like "Turkish" would not get these characters. This feels wrong, but may be more simple from an implementation perspective (IF we can find a good name).
Of course, in either solution, we run into the issue of certain languages simply not existing in the PowerToys character set (i.e. Arabic, Persian/Farsi, etc.) but I think those are problems that could be identified via folks creating issues and then solved in future PRs. If there is a specific language that doesn't exist that you would want to see added, then I suggest you do the work to create the set for just that language.
I personally prefer solution 1, because although we don't have every language in our current set of 34 languages listed, it's more true to the expectation of selecting an actual language from the drop down rather than a set of characters.
Hello @ethanfangg, thank you for your response. Here are some things I want to point out.
Solution 1 There are currently only three Middle Eastern languages in Quick Accent:
- Turkish: uses Latin characters
- Kurdish: The characters in Quick Accent are the ones used for romanization
- Hebrew: The Quick Accent set are the characters used for writing Hebrew, not the romanization ones
Therefore, Solution 1 would mean just adding the characters to Hebrew, so it would be a mix of the Latin characters used for transliteration and the Hebrew script characters.
Solution 2 First issue: can't decide on an appropriate name. Second issue: Kurdish already has its romanization characters, meaning there would be one set for romanization characters specific to Kurdish and one for all Middle Eastern languages.
I don't know what the original issue was trying to achieve, so I think let's ask @ohaniandaniel.
@PesBandi, @ethanfangg , having read through all of your comments, I find Ethan's solution 2 to be the best. I suggest calling the character set "Middle Eastern Romanization."
have we reached a consensus here about the way forward? @ethanfangg are you ok with Solution 2 - "new character set callsed "Middle Eastern Romanization." ?
Hi, @ethanfangg. I'm the person who filed the original request that led to the creation of this pull request. Is there anything I can do to help move this along? It seems like we're all on the same page now.
Thanks for the bump @ohaniandaniel
@stefansjfw I'm fine with calling it Middle Eastern Romanization for now - we may need to eventually split into individual languages if that's a more appropriate solution, but I don't think it is worth continuing to be blocked on naming based on internal feedback.
@jaimecbernardo fyi on updates here
Hi @PesBandi , I've taken a look at the PR again and feel like it's ready to go in. I wanted to change some internals and make it more about the "Romanization" part of "Middle Eastern Romanization", since in the future we might end up adding more "Romanization" characters in here. I merged latest main in and changed internal strings like "ME" to "ROM". https://github.com/microsoft/PowerToys/pull/31905/commits/86732a8517bfacd860b912603e9d42f0940916f5
Hope you're OK with these changes.
Hi, thanks for the review, I'm glad that it's moving closer to getting merged. I totally agree with your changes.