es-hangul icon indicating copy to clipboard operation
es-hangul copied to clipboard

[Bug]: `disassembleHangul` is incorrect for double vowel and double consonant

Open roeniss opened this issue 1 year ago • 2 comments

Bug description

In official docs, disassembleHangul works as "한글 문자열을 글자별로 초성/중성/종성 단위로 완전히 분리하여" (in English, "seperates Korean works into onset/nucleus/coda syllables"), which is not what the function actually does.

  1. Double consonants (e.g., ㄳ, ㄽ) should be treated as a single syllable. Currently it doens't.
  2. Double vowels (e.g., ㅐ, ㅘ) should be treated as a single syllable. Currently it does sometimes.

Expected behavior

h.disassembleHangul('개') // I think this would be 'ㄱㅐ', and it is.
h.disassembleHangul('과') // I think this would be 'ㄱㅘ', but it isn't. it's `ㄱㅗㅏ`

To Reproduce

[email protected]

Possible Solution

skipped, because i'm unsure whether this is intentional or not.

etc.

Here's full test cases below. I think every assertion should be passed.

// double consonant 1 (ㄲ, ㄸ, ㅃ, ㅆ, ㅉ)
// onset 
h.disassembleHangul('까') == 'ㄲㅏ'
h.disassembleHangul('따') == 'ㄸㅏ'
h.disassembleHangul('빠') == 'ㅃㅏ'
h.disassembleHangul('싸') == 'ㅆㅏ'
h.disassembleHangul('짜') == 'ㅉㅏ'

// code
h.disassembleHangul('갂') == 'ㄱㅏㄲ'
h.disassembleHangul('갔') == 'ㄱㅏㅆ'

// double consonant 2 (ㄳ, ㄵ, ㄶ, ㄺ, ㄻ, ㄼ, ㄽ, ㄾ, ㄿ, ㅀ, ㅄ)
// code
h.disassembleHangul('갃') == 'ㄱㅏㄳ' // false
h.disassembleHangul('갅') == 'ㄱㅏㄵ' // false
h.disassembleHangul('갆') == 'ㄱㅏㄶ' // false
h.disassembleHangul('갉') == 'ㄱㅏㄺ' // false
h.disassembleHangul('갊') == 'ㄱㅏㄻ' // false
h.disassembleHangul('갋') == 'ㄱㅏㄼ' // false
h.disassembleHangul('갌') == 'ㄱㅏㄽ' // false
h.disassembleHangul('갍') == 'ㄱㅏㄾ' // false
h.disassembleHangul('갎') == 'ㄱㅏㄿ' // false
h.disassembleHangul('갏') == 'ㄱㅏㅀ' // false
h.disassembleHangul('값') == 'ㄱㅏㅄ' // false

// single vowel (ㅏ, ㅑ, ㅓ, ㅕ, ㅗ, ㅛ, ㅜ, ㅠ, ㅡ, ㅣ)
// nucleus
h.disassembleHangul('가') == 'ㄱㅏ'
h.disassembleHangul('갸') == 'ㄱㅑ'
h.disassembleHangul('거') == 'ㄱㅓ'
h.disassembleHangul('겨') == 'ㄱㅕ'
h.disassembleHangul('고') == 'ㄱㅗ'
h.disassembleHangul('교') == 'ㄱㅛ'
h.disassembleHangul('구') == 'ㄱㅜ'
h.disassembleHangul('규') == 'ㄱㅠ'
h.disassembleHangul('그') == 'ㄱㅡ'
h.disassembleHangul('기') == 'ㄱㅣ'

// double vowel (ㅐ, ㅒ, ㅔ, ㅖ, ㅘ, ㅙ, ㅚ, ㅝ, ㅞ, ㅟ, ㅢ)
// nucleus
h.disassembleHangul('개') == 'ㄱㅐ'
h.disassembleHangul('걔') == 'ㄱㅒ'
h.disassembleHangul('게') == 'ㄱㅔ'
h.disassembleHangul('계') == 'ㄱㅖ'
h.disassembleHangul('과') == 'ㄱㅘ' // false
h.disassembleHangul('괘') == 'ㄱㅙ' // false
h.disassembleHangul('괴') == 'ㄱㅚ' // false
h.disassembleHangul('궈') == 'ㄱㅝ' // false
h.disassembleHangul('궤') == 'ㄱㅞ' // false
h.disassembleHangul('귀') == 'ㄱㅟ' // false
h.disassembleHangul('긔') == 'ㄱㅢ' // false

roeniss avatar Apr 20 '24 16:04 roeniss

I guess this is highly related to the common Korean keyboard layout, which makes sense in some ways.

I just wanted to point out that the inconsistency might add another cognitive load to users.

roeniss avatar Apr 21 '24 12:04 roeniss

Maybe some of its usage is re-assemble disassembled one. We might need options to deal with double consonant.

export function assembleHangul(words: string[]) {
  const disassembled = disassembleHangul(words.join('')).split('');
  return disassembled.reduce(binaryAssembleHangul);
}

assembleHangul(['값', 'ㅣ ', '너무', '빘 ', 'ㅏ']) // its useful to make as "갑시 너무 비싸"

KangYunHo1221 avatar May 14 '24 08:05 KangYunHo1221

Thank you for giving me a good opinion. I'll keep the issue closed because there's no further discussion. If you need to discuss it further, please feel free to open the issue.

okinawaa avatar Jun 01 '24 12:06 okinawaa