monkeytype icon indicating copy to clipboard operation
monkeytype copied to clipboard

[Japanese] Kana-Input not working on digraphs (e.g. ぎゅ)

Open xlaech opened this issue 2 years ago • 9 comments

Did you clear cache before opening an issue?

  • [X] I have cleared my cache

Is there an existing issue for this?

  • [X] I have searched the existing issues

Does the issue happen when logged in?

Yes

Does the issue happen when logged out?

Yes

Does the issue happen in incognito mode when logged in?

N/A

Does the issue happen in incognito mode when logged out?

Yes

Account information

{"theme":"serika_dark","customTheme":false,"customThemeColors":["#323437","#e2b714","#e2b714","#646669","#d1d0c5","#ca4754","#7e2a33","#ca4754","#7e2a33"],"favThemes":[],"showKeyTips":true,"showLiveWpm":false,"showTimerProgress":true,"smoothCaret":true,"quickTab":false,"punctuation":false,"numbers":false,"words":"10","time":60,"mode":"words","quoteLength":[1],"language":"japanese_hiragana","fontSize":15,"freedomMode":false,"difficulty":"expert","blindMode":false,"quickEnd":false,"caretStyle":"default","paceCaretStyle":"default","flipTestColors":false,"layout":"default","funbox":"none","confidenceMode":"off","indicateTypos":"off","timerStyle":"mini","colorfulMode":false,"randomTheme":"off","timerColor":"main","timerOpacity":"1","stopOnError":"off","showAllLines":false,"keymapMode":"off","keymapStyle":"staggered","keymapLegendStyle":"lowercase","keymapLayout":"overrideSync","fontFamily":"Roboto_Mono","smoothLineScroll":false,"alwaysShowDecimalPlaces":false,"alwaysShowWordsHistory":false,"singleListCommandLine":"manual","capsLockWarning":true,"playSoundOnError":false,"playSoundOnClick":"off","soundVolume":"0.5","startGraphsAtZero":true,"swapEscAndTab":false,"showOutOfFocusWarning":true,"paceCaret":"off","paceCaretCustomSpeed":100,"repeatedPace":true,"pageWidth":"100","chartAccuracy":true,"chartStyle":"line","minWpm":"off","minWpmCustomSpeed":100,"highlightMode":"letter","alwaysShowCPM":false,"enableAds":"off","hideExtraLetters":false,"strictSpace":false,"minAcc":"off","minAccCustom":90,"showLiveAcc":false,"showLiveBurst":false,"monkey":false,"repeatQuotes":"off","oppositeShiftMode":"off","customBackground":"","customBackgroundSize":"cover","customBackgroundFilter":[0,1,1,1,1],"customLayoutfluid":"qwerty#dvorak#colemak","monkeyPowerLevel":"off","minBurst":"off","minBurstCustomSpeed":100,"burstHeatmap":false,"britishEnglish":false,"lazyMode":false}

Current Behavior

When selecting "Japanese - Hiragana" as the language one can type out simple Japanese Words in Hiragana.

When using a Romanji based input method (where you type out the words using the latin alphabet) there are no issues. However when using the japanese Kana-Input System (where each key corresponds to a sylable) small versions of ょゅゃ  in diagraphs are considerd as an error.

Expected Behavior

Typing "き + ゛ + ゅ " results in "ぎゅ" with the word beeing accepted.

At the moment the ぎ is accepted and ゅ is shown in red and counted as an error.

Steps To Reproduce

  1. Install Google IME https://www.google.co.jp/ime/
  2. Set Input to Kana-Input
  3. Start any lesson with ゅ in it

Environment

  • OS: Mac Big Sur 11.5
  • Browser: Firefox
  • Browser Version: 97.04

Anything else?

This is probably due to a fix trying to work with the romanji input system. The behaviour of the kana-input is very close to a normal english input system: one keypress = one character

xlaech avatar Feb 20 '22 11:02 xlaech

This is a known issue and pops up in Vietnamese and Korean. Its difficult to implement because we would need a map of every character and how it can be 'built' into the required character.

Miodec avatar Feb 20 '22 12:02 Miodec

Is the problem the changing of き to ぎ? Because otherwise this should be a different issue. With the acception of ゛and ゜all keyinputs correspond to a character.

xlaech avatar Feb 20 '22 15:02 xlaech

Is the problem the changing of き to ぎ?

Yes

Edit: Input is continuously compared against the current character to calculate accuracy. It works well with the usual input methods where 1 key is 1 character. However, when multiple keystrokes can equal one character, checking that character will not match the required character until its fully 'built' - resulting in lower accuracy.

Thats why the only way that i can think of currently is to have a somehow implement a function that can look at a character like and give a true or false value if it can produce from

Miodec avatar Feb 20 '22 16:02 Miodec

I see. For hiragana this would be a very small lookup list. Does it help if i provide ones?

For Kanji this would be quite a big lookup table making it more viable to reduce this to a specific number of common words. Another Idea would be to look at the current state of what has been typed (in a text field) instead of managing the typing to display by oneself.

xlaech avatar Feb 20 '22 16:02 xlaech

I see. For hiragana this would be a very small lookup list. Does it help if i provide ones?

Yeah

For Kanji this would be quite a big lookup table making it more viable to reduce this to a specific number of common words.

Still breaks thing once the user types something that is not in the table. Plus this can get big.

Another Idea would be to look at the current state of what has been typed (in a text field) instead of managing the typing to display by oneself.

Not sure what you mean there

Miodec avatar Feb 23 '22 23:02 Miodec

Hey Miodec,

First here a list of all the changes in hiragana (https://en.wikipedia.org/wiki/Hiragana):

Hiragana-Changes

  • か + ゛= が
  • き + ゛= ぎ
  • く + ゛= ぐ
  • け + ゛= げ
  • こ + ゛= ご
  • さ + ゛= ざ
  • し + ゛= じ
  • す + ゛= ず
  • せ + ゛= ぜ
  • そ + ゛= ぞ
  • た + ゛= だ
  • ち + ゛= ぢ
  • つ + ゛= づ
  • て + ゛= で
  • と + ゛= ど
  • は + ゛= ば
  • ひ + ゛= び
  • ふ + ゛= ぶ
  • へ + ゛= べ
  • ほ + ゛= ぼ
  • は + ゜= ぱ
  • ひ + ゜= ぴ
  • ふ + ゜= ぷ
  • へ + ゜= ぺ
  • ほ + ゜= ぽ

About the Kanji:

I assume that the system atm is implemented in a way, where you check for keypresses by the user and compare those with the key expected next. This makes it very difficult to handle typing-systems, where the user "combines" characters before they are finally "commited", because all the combinations and special keys need to be accepted by the frontend - which is a lot of work.

(Ignoring the master-mode which is character based) one could just let the userinput be handled by a regular text box (which supports any native keyboardlayout out of the box) until the user has commited his text by pressing enter or space. Only after the whole message/word has been "commited" it is compared against the expected word.

Since fast-fingers is using a textbox for their inputs anyways they already support japanese input: https://10fastfingers.com/typing-test/japanese Monkeytype has a much more clean and streamlined look though... Don't know if one could have the textbox "hidden" so it is not destroying the beautiful optic.

xlaech avatar Feb 26 '22 21:02 xlaech

one could just let the userinput be handled by a regular text box (which supports any native keyboardlayout out of the box) until the user has commited his text by pressing enter or space. Only after the whole message/word has been "commited" it is compared against the expected word.

Thats what we have right now - we have a hidden input box.

Since fast-fingers is using a textbox for their inputs anyways they already support japanese input: https://10fastfingers.com/typing-test/japanese Monkeytype has a much more clean and streamlined look though... Don't know if one could have the textbox "hidden" so it is not destroying the beautiful optic.

Support is not an issue - accuracy is. Monkeytype should already support typing in pretty much any language, but calculating accuracy for languages that 'build' characters is an issue.

Miodec avatar Feb 26 '22 21:02 Miodec

That is a hard one. The easier way would probably to not calculate an accuracy for these languages but let people type in them.

For Hiragana you could hardcode the exceptions listed above and then use another backend (e.g. an online dictionary https://jisho.org/search/日本語) to get the hiragana reading for a specific kanji word. This way you can resolve Kanji -> Hiragana -> Keypresses and calculate an accuracy. You still would need to account for the 2 different input methods - romanji being character based (2 keypresses per hiragana) vs. kana-input (1 keypress per hiragana).

In the end this would be a lot of work just for getting the accuracy. Therefore it would just be easier to not calculate it in a first step :)

xlaech avatar Feb 26 '22 21:02 xlaech

The easier way would probably to not calculate an accuracy for these languages but let people type in them.

I totally agree with @xlaech about this. Maybe excluding any languages that do not support accuracy calculation.

nguyenanhducs avatar Aug 11 '22 08:08 nguyenanhducs

Pushed a new change that should stop the accuracy from being miscalculated. Could you guys please test it? (make sure your layout override is disabled in the website settings)

Miodec avatar Oct 04 '22 12:10 Miodec

Hey Gökmen,

as we already discussed in the thread above the issue is about too different problems:

  • Lookup-Table: * We need to have a way to define which primitive inputs result in what end character * I provided one for Hiragana (Kana Mode) in the Thread * The list for Kanji would be… let’s say long 😉 So we would need to get this with a script from a dictionary
  • Accuracy Calculation * This Task is quite challenging and should be considered different * The accuracy calculation assumes, that you have 1 input for 1 character * This would result in al low accuracy if the system is implemented as suggested above

Hope I could clarify. I would still be interessed in this being implemented – however I do understand that other issues might be more pressing considering the small amout of people typing in Japanese let alone the kana mode 😊

Greetings, Nicolas

Von: Gökmen Kaplan @.> Datum: Donnerstag, 29. September 2022 um 21:08 An: monkeytypegame/monkeytype @.> Cc: Nicolas Mori @.>, Mention @.> Betreff: Re: [monkeytypegame/monkeytype] [Japanese] Kana-Input not working on digraphs (e.g. ぎゅ) (Issue #2545)

I think this is a normalizationhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.mozilla.org%2Fen-US%2Fdocs%2FWeb%2FJavaScript%2FReference%2FGlobal_Objects%2FString%2Fnormalize&data=05%7C01%7C%7C665b30326a6a41ec51e508daa24df3e9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638000752932784067%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ys%2B6bAMAKwNeWkeDx8MF0qSV3h7VPkVxif54u0zaBss%3D&reserved=0 issue.

I reread the thread and now I think I get it, looking at the unicode block for hiraganahttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FHiragana_(Unicode_block)&data=05%7C01%7C%7C665b30326a6a41ec51e508daa24df3e9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638000752932784067%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3m8wk8JgpAJejUJ2xXDsROoggtZHK7yovkKKix9vsHE%3D&reserved=0 you can see there are different "combining characters" (idk what they're called officially) for ゛ and ゜, there are also different characters for each combination as well (e.g. き and ぎ but these are characters on their own.) so I tried copy pasting the raw character from the wikipedia article and I think I can better explain it with some screenshots: You can have the combination as a single code point [image]https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F48455693%2F193119729-cbf8e7af-2969-48f0-9903-c9d19e32bb5d.png&data=05%7C01%7C%7C665b30326a6a41ec51e508daa24df3e9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638000752932940014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FLA9EeDP%2BT9onqaGkwsQAfWmLA%2B5%2FvbuQVRC2heIgkc%3D&reserved=0

Or a combination of two different code points [image]https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F48455693%2F193119811-73b7ae23-7fdc-41b1-bd05-3f094fcb2a49.png&data=05%7C01%7C%7C665b30326a6a41ec51e508daa24df3e9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638000752932940014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=94EIMWPuJn3dgNuTH%2BuQ1YD8m0%2FnC5aGS3GxpZzOLxo%3D&reserved=0

maybe dead keys but "dead characters"?

— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmonkeytypegame%2Fmonkeytype%2Fissues%2F2545%23issuecomment-1262700857&data=05%7C01%7C%7C665b30326a6a41ec51e508daa24df3e9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638000752932940014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WzntwznnCT%2Fe60iU%2F2CI%2FnFd38l2VbxoKZiNToxxBgo%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB7IMYS6CUT5G3WIGLAC7IDWAXSJVANCNFSM5O4I2T5Q&data=05%7C01%7C%7C665b30326a6a41ec51e508daa24df3e9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638000752932940014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=4BhvPZ0jUVIdCVaPFtXZJyCM8eq1DB9cqs3hO%2Fmk9NE%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

xlaech avatar Oct 11 '22 09:10 xlaech

Pushed a new change that should stop the accuracy from being miscalculated. Could you guys please test it? (make sure your layout override is disabled in the website settings)

素晴らしいい 👍 Problem solved. Google IME with Kana input worked perfectly now.

xlaech avatar Oct 12 '22 08:10 xlaech