wincompose icon indicating copy to clipboard operation
wincompose copied to clipboard

Pinyin

Open cjbarth opened this issue 3 years ago • 14 comments

Would a PR be welcome to add support for pinyin vowels? Or perhaps all pinyin combinations? Pinyin is typed such that you type the consonants and then a number indicating the tone. In this way you'd end up with nǐhǎo, which is currently not possible to type with WinCompose. I can't even type ǐ or ǎ. The closest that it appears that I can get it ĭ and ă, which isn't the same. I guess for those letters I would expect [compose]Vi and [compose]Va to match the [compose]Ui and [componse]Ua patterns. However, if desired I could put together a PR that included all the valid pinyin combinations following this table and generally following a regex like this /[a-zA-Z]{1,6}[0-9]/. I don't think such a pattern would involve any conflicts given that every combination would end in a number.

If that isn't appreciated, I could create a lesser PR to include just the vowels used in pinyin, which I think would only be 5 additional characters for the 3rd tone.

cjbarth avatar Feb 27 '21 01:02 cjbarth

That sounds like a good idea. Would the whole pinyin set conflict with any existing rule?

samhocevar avatar Feb 27 '21 09:02 samhocevar

I doubt it @samhocevar , but I don't know a good way to figure that out; I can't look through each of the 4,000+ rules to check very easily. Thoughts?

cjbarth avatar Feb 27 '21 13:02 cjbarth

I don't have access to a computer right now but I can check tomorrow.

samhocevar avatar Feb 27 '21 13:02 samhocevar

There are no meaningful sequences that start with V, so Va or Vi etc. would be fine. What would the other sequences for pinyin look like?

samhocevar avatar Feb 28 '21 10:02 samhocevar

These are all the vowels that are needed for pinyin:

1st tone: ā ē ī ō ū ǖ Ā Ē Ī Ō Ū Ǖ 2nd tone: á é í ó ú ǘ Á É Í Ó Ú Ǘ 3rd tone: ǎ ě ǐ ǒ ǔ ǚ Ǎ Ě Ǐ Ǒ Ǔ Ǚ 4th tone: à è ì ò ù ǜ À È Ì Ò Ù Ǜ 5th tone: a e i o u ü A E I O U Ü

I believe most of these already exist, so it would pretty much just be the third tone, which could all start with a capitol V.

Including all pinyin combinations is another matter and would require a keyboard combination for all these combinations that would place the tone mark over the correct vowel.

cjbarth avatar Feb 28 '21 15:02 cjbarth

@samhocevar Is a PR welcome for this?

cjbarth avatar Mar 11 '21 00:03 cjbarth

Sorry, yes, it is totally welcome!

I plan to move the default sequences to a separate project so that Linux users may benefit from it, but that is not for the near future.

samhocevar avatar Mar 11 '21 06:03 samhocevar

In what file would you like me to add these sequences?

cjbarth avatar Apr 06 '21 21:04 cjbarth

I think src/rules/WinCompose.txt is the most appropriate place.

samhocevar avatar Apr 06 '21 21:04 samhocevar

Ok, I'm trying to figure out what to add, so that I don't do any duplicates. I looked up in Compose.pre the characters that Pinyin needs, and I found them all. However, I can't type them all. For example, when I do I should get ā, but I get ª instead. However, gets me ē, like I would expect. So it appears there are some double entries. How should I proceed?

cjbarth avatar Apr 07 '21 18:04 cjbarth

You can disable the XCompose sequences in Options → Composing → Sequences. These conflicts are unfortunate but in a later version I will make sure WinCompose.txt has precedence.

samhocevar avatar Apr 07 '21 20:04 samhocevar

That does seem to help. In that case, there doesn't seem to be a need to add any Pinyin-related characters, as they are all already in Compose.pre. Having said that, your comment about conflicts is valuable. I would love those options under Sequences to be a sortable list instead of check boxes. They can all remain enabled, as long as I get to pick precedence.

Having said that, I guess my question, which you may already know the answer to, is: Are there any characters that aren't available at all because the only ways to "compose" them are potentially overloaded? If so, it seems there are advantages to seeing if an alternate can be provided for those.

cjbarth avatar Apr 07 '21 20:04 cjbarth

Existing sequences

1 2 3 4 Neutral
ā a- á a' or 'a ǎ va or ca à or ˋa a
ē e- or -e é e' or 'e ě ce è or ˋe e
ī i- or -i í i' or 'i ǐ vi or ci ì ˋi i
ō o- ó o' or 'o ǒ vo or co ò ˋo o
ū u- or -u ú u' or 'u ǔ vu or cu ù ˋu u
ǖ _"u ǘ '"u ǚ c"u ǜ ˋ"u ü u" or "u

Available sequences with v

1 2 3 4 Neutral
ǖ v- ǘ v' or 'v ǚ cv ǜ ˋv ü

JapanYoshi avatar Sep 03 '21 10:09 JapanYoshi

That is some good research. Thanks @JapanYoshi . Is a PR welcome for the v sequences, as that is much easier to type than the ü-based ones.

cjbarth avatar Sep 28 '21 00:09 cjbarth