source-han-serif icon indicating copy to clipboard operation
source-han-serif copied to clipboard

About the region-specific forms in the non-unified CJK ideographs Unicode blocks

Open tamcy opened this issue 2 years ago • 0 comments

In addition to the unified CJK ideographs and the Kangxi Radicals Unicode blocks, region-specific glyph forms can also be found in the following blocks:

  • Enclosed CJK Letters and Months (U+3200-U+32FF)
  • Enclosed Ideographic Supplement (U+1F200-U+1F2FF)
  • CJK Compatibility (U+3300-U+33FF) (only U+337F ㍿ contains a region-specific form in this block)

While I appreciate the inclusion of localized forms for these codepoints, I doubt their usefulness to non-Japanese users (with the exception of the enclosed number forms; they can at least be used in ordered lists). For instance, as I know, ㈲ (U+3232) is the abbreviated form of "有限会社" (Limited Company) in Japanese, but it doesn't have any specific meaning in a Chinese context. This is barely useful, unless I want to explain the Japanese meaning of this character to CN/TW/HK users. But for this case I'd rather use its native JP glyph form.

Also, the character of some codepoints, while in a localized stroke form, still may not be suitable for our daily use. For example, U+1F235 🈵 is the character "満" enclosed in a square bracket to indicate "full". While there is a dedicated region-specific glyph (u1F235-CN) for CN/TW/HK, in TW/HK we'd normally use "滿". And for CN the official simplified form is "满". So, strictly speaking, the current design of u1F235-CN is still not "entirely" localized. But whether to change it to a "entirely localized" form is debatable, it really depends on how you see it.

Lastly, a couple of issues are found in the localized glyphs in these two code blocks, and some of them may require extra CID accommodation to completely fix the issue, which may not worth the effort given the reasons outlined above.

Which is why I'd first suggest to consider removing the non-JP glyphs from these two blocks, with the exception of the enclosed numeric forms.


Click here to see the proposed changes, if only the JP form is preserved

Proposed changes, if only the JP form is preserved

Enclosed CJK Letters and Months (U+3200-U+32FF)

Codepoint Region Current mapping Remarks
U+3226 ㈦ HK uni3226-TW Remap to uni3226-JP
U+3227 ㈧ HK uni3227-TW Remap to uni3227-CN
U+3286 ㊆ HK uni3286-TW Remap to uni3286-JP
U+3287 ㊇ HK uni3287-TW Remap to uni3287-CN
U+32AC ㊬ JP uni32AC-CN Looks like the JP form was removed in v2.0. But the design of 臣 is different, so the CN form can't be used for JP. Need to restore the original JP glyph.

Removable glyphs: uni322B-TW, uni3232-TW, uni3233-CN, uni3235-TW, uni3237-CN, uni323C-CN, uni323E-TW, uni3240-TW, uni3243-CN, uni3245-CN, uni3246-CN, uni3247-CN, uni3247-TW, uni328B-CN, uni328B-TW, uni3292-TW, uni3293-CN, uni3295-TW, uni3297-CN, uni329B-CN, uni329B-TW, uni329C-CN, uni329C-TW, uni329D-TW, uni329E-CN, uni329E-TW, uni32A2-CN, uni32A2-TW, uni32A9-CN, uni32A9-TW, uni32AA-CN, uni32AA-TW, uni32AC-CN, uni32AC-TW, uni32AE-TW, uni32B0-CN, uni32B0-TW

Enclosed Ideographic Supplement (U+1F200-U+1F2FF)

Removable glyphs: u1F211-CN, u1F216-CN, u1F217-CN, u1F218-CN, u1F219-TW, u1F21A-TW, u1F21B-TW, u1F21C-TW, u1F21D-CN, u1F21D-TW, u1F21F-CN, u1F21F-TW, u1F220-CN, u1F221-CN, u1F223-CN, u1F225-CN, u1F226-CN, u1F227-CN, u1F22B-CN, u1F22B-TW, u1F22F-TW, u1F232-TW, u1F233-CN, u1F233-TW, u1F235-CN, u1F236-TW, u1F239-CN, u1F239-TW, u1F23A-CN, u1F23B-TW, u1F243-CN, u1F243-TW, u1F246-CN, u1F246-TW, u1F247-CN

CJK Compatibility (U+3300-U+33FF)

Removable glyphs: uni337F-CN-V, uni337F-CN


Meanwhile, if it is decided to keep the current implementation because of other considerations, I would suggest the following glyph issues be addressed:

Again, some changes require new glyphs be introduced, mostly for HK. Instead of doing so, for the HK version I'd rather suggest to use the JP forms for all glyphs in these three blocks to save the hassles. As a Hongkonger I bet the HK users will not need the localized forms for these three Unicode blocks, and will barely miss the CN/TW forms.


Click here to see the suggested changes if the region-specfic standard is to be kept

Suggested changes if the region-specfic standard is to be kept

Enclosed CJK Letters and Months (U+3200-U+32FF)

Codepoint Affected region(s) Current mapping Suggested changes
U+3226 ㈦ HK uni3226-TW Remap to uni3226-JP
U+3227 ㈧ HK uni3227-TW Remap to uni3227-CN
U+323C ㈼ TW, HK uni323C-CN A new glyph is needed
U+323E ㈾ HK uni323E-TW Remap to uni323E-JP
U+3240 ㉀ HK uni3240-TW Remap to uni3240-JP
U+3247 ㉇ TW, HK uni3247-TW Component 爭 not correct
U+3286 ㊆ HK uni3286-TW Remap to uni3286-JP
U+3287 ㊇ HK uni3287-TW Remap to uni3287-CN
U+3292 ㊒ HK uni3292-TW Remap to uni3292-JP
U+329C ㊜ CN, TW, HK U+329C ㊜, uni329C-TW Check the design of the 辶 component
U+329D ㊝ HK uni329D-TW Remap to uni329D-JP
U+32A9 ㊩ TW, HK uni32A9-TW 匸 component incorrect
U+32AC ㊬ JP uni32AC-CN Looks like the JP form was removed in v2.0. But the design of 臣 is different, so the CN form can't be used for JP. Need to restore the original JP glyph. A new glyph is needed.
U+32AE ㊮ HK uni32AE-TW Remap to uni32AE-JP

Enclosed Ideographic Supplement (U+1F200-U+1F2FF)

Codepoint Affected region(s) Current mapping Suggested changes
U+1F21F 🈟 HK u1F21F-CN "木" is different; a new glyph is needed
U+1F22B 🈫 TW, HK u1F22B-TW Check the design of the 辶 component
U+1F232 🈲 HK u1F232-JP Last stroke of the right 木 is different; a new glyph is needed

CJK Compatibility (U+3300–U+33FF)

Codepoint Region Current mapping Suggested change
U+337F ㍿ CN, TW, HK uni337F-CN-V Check if the mapping is intentional. Currently, JP maps to uni337F-JP by default which is a horizontal form, while CN/TW/HK map to its vertical form uni337F-CN-V. This is different from v1.0 and Source Han Sans, where all regions use the horizontal form by default.

Thanks!

tamcy avatar Dec 28 '21 09:12 tamcy