monkeytype icon indicating copy to clipboard operation
monkeytype copied to clipboard

Japanese input via IME produces 'errors' on every keystroke in test results

Open VanWeapon opened this issue 3 years ago • 5 comments

Did you clear cache before opening an issue?

  • [X] I have cleared my cache

Is there an existing issue for this?

  • [X] I have searched the existing issues

Does the issue happen when logged in?

Yes

Does the issue happen when logged out?

Yes

Does the issue happen in incognito mode when logged in?

Yes

Does the issue happen in incognito mode when logged out?

Yes

Account name

VanWeapon

Account config

No response

Current Behavior

After completing a test using Japanese Hiragana or Japanese Katakana as the language, and using an IME for the input method (I use google IME specifically) each keystroke seems to be counted as an error, even though it takes multiple keystrokes to produce a single non-vowel character when inputting via IME.

This results in a score which has a much poorer accuracy than expected, and makes it impractical to use the accuracy score measurement as a baseline to try and legitimately improve kana typing accuracy and wpm.

image

Expected Behavior

Errors should only be registered if the direct input characters do not match the conjoined character representation required to produce the kana output.

E.g. the hiragana character か "ka" requires 'k' and 'a' to be input before the IME transforms the character to か. Vowels such as お "o" only require a single character, and seem to be marked as correct? unsure as it's hard to verify.

If the expected character is か then the system should not treat the corresponding direct input characters 'k' and 'a' as errors, and should instead treat other characters as errors.

This gets more complicated with characters like ぎょ "gyo" which can be legitimately typed as either "gyo" or "gixyo" where, using an IME, you can type the two characters separately, and "x" is used to denote that the next character should be the small version of the character you are about to type, in this case よ "yo".

However, since we are only talking about kana (hiragana and katakana) there is a fixed number of accepted inputs that are possible to produce a given kana character or group of characters, so these could be (theoretically) statically mapped, as opposed to kanji where you are essentially performing a fuzzy dictionary search each time you try to write a word.

Steps To Reproduce

Install and activate an IME, either microsoft IME or Google IME. Microsoft ime is probably easier to get up and running: How to install Change test language to japanese hiragana or japanese katakana Complete a time or word test Review result and observe high number of errors in test results and low accuracy.

Environment

  • OS: Windows 10
  • Browser: Opera GX
  • Browser Version: 90.0.4480.117

Anything else?

I recognise this is probably not a high priority issue given low number of potential users, but wanted to raise it anyway for tracking. If you want assistance building out a map of direct input > japanese characters I can help with that and provide you a mapping in whatever format you need.

VanWeapon avatar Sep 24 '22 16:09 VanWeapon

Is this not just a dupe of #2545?

Ferotiq avatar Sep 27 '22 19:09 Ferotiq

Is this not just a dupe of #2545?

No, that issue refers to a kana input keyboard, where each physical key is a kana symbol. This issue is for the Japanese IME which converts English letters to kana on the fly

VanWeapon avatar Sep 28 '22 02:09 VanWeapon

Is this not just a dupe of #2545?

No, that issue refers to a kana input keyboard, where each physical key is a kana symbol. This issue is for the Japanese IME which converts English letters to kana on the fly

Gotcha

Ferotiq avatar Sep 28 '22 02:09 Ferotiq

To clarify, the outcome is pretty similar in both cases, in #2545 there is an accuracy hit because when using kana-input mode on the IME, the extra keystrokes required to build a diagraph result in lower accuracy, with this issue, the same general problem occurs, i.e. lower accuracy due to multiple keystrokes to input 1 character, except it is happening with romaji-input mode and happens for basically all non-vowel characters, not just diagraphs. Ref the screenshot where input modes are selectable.

image

As @Miodec mentioned in that issue, solving both issues does require a map of every character which is used to build the next character. But for both japanese-hiragana and japanese-katakana, that is a fixed-length list which I can provide in JSON or whatever other format.

I also agree that trying to solve the accuracy problem for Kanji is a completely separate and likely unobtainable goal, but kana is much more achievable, if something like below could be leveraged in the solution.

e.g.

{
"か": ["ka"], // only has one input method
"つ": ["tsu", "tu"], //has 2 valid input methods for a single character
"ぎょ": ["gyo","gixyo"], // has 2 correct methods of input, and produces 2 characters of output
}

VanWeapon avatar Sep 30 '22 03:09 VanWeapon

@VanWeapon I agree with your solution. However, I found a small problem with your example. "ぎょ" can be converted to not only “gyo” and “gixyo” but also "gilyo"! This will need to be taken into consideration when creating an actual solution.

weensy avatar Sep 08 '23 12:09 weensy