source-han-serif icon indicating copy to clipboard operation
source-han-serif copied to clipboard

Consolidation of CJK component unification across different regions to reduce unnecessary characters

Open Marcus98T opened this issue 2 years ago • 4 comments

In hindsight, I should have created this page to consolidate all my issues for every single component that may need unification, without opening too many separate issues. Just like when Dr Lunde created a consolidation of glyphs page, but have since been largely abandoned after he left.

Over time, I will start nitpicking more details which I deem unnecessary and can be shared with as many regions as possible, without breaking the respective national government glyph standard rules too much. My general preference is towards JP-style glyph shapes with some exceptions. The purpose is to ensure that by unifying components, we can keep it under the 65536 glyph limit and have more room to add necessary glyphs in the future.

I am very well aware that some of the unifications presented here might have been discussed before (and which I didn't have the time to go through thoroughly) and could be rejected, but I am going to keep them because I would like a confirmation on whether these unifications can go ahead under the new design team from Arphic.

For now it's a very quick summary, after Chinese New Year, I will update with a (perhaps incomplete) list of affected characters to fix for each affected component, if it isn't too much to handle. Unfortunately I may not be able to provide a complete list as finding the affected component in every character is very time consuming and there are currently no tools to do that instantly.

This page will be updated regularly if I can.

For components that I believe are safe to unify without breaking the 新字形 or Taiwan/Hong Kong educational standard rules

夕 component

I think having the dot (丶) touching (or at least be very close to) the ク part should be OK for the CN region.

As per the example image (will be updated soon for more characters), 夕 should be JP (marked green) for all regions, the JP version of 多 is mapped to all regions except for CN (marked yellow), which has an almost identical looking duplicate, but the top component is shifted a little bit to the right. 移 is currently mapped to JP for all regions. Screenshot 2022-01-29 at 00 25 44

For 多, I think maybe adjust the JP glyph to move the top component a tiny bit to the right and then remove the CN glyph.

For 名, I suggest to adjust the CN glyph, the 夕 part, so the dot can touch the ク part, and in turn match the balance of the JP aesthetic. Screenshot 2022-01-29 at 00 50 45

死 component

Unlike the other components which I prefer JP forms, the CN form is what I prefer, with 匕 touching the topmost horizontal line (circled in red). If it can be unified, adjust the JP glyphs so that 匕 touch the the topmost horizontal line (with no serif detail), and then maybe adjust the 丿 part of 匕 to lower it a bit like the CN glyph as per the A15 remarks from #155, and then remove the redundant CN glyphs.

Alternatively, an easier way would be to remove the JP glyphs and assign the CN glyphs to the JP and KR glyphs if unifiable, but I strongly prefer the JP balance and aesthetics though.

Screenshot 2022-01-29 at 01 30 50 If the CN glyph is to be kept instead, the 夕 part would have to be adjusted as well (circled in orange, see above "夕 component").

Understandably, we cannot unify all regions as the TW standard demands a more horizontal stroke for 匕.

While most Japanese fonts (samples from my Mac pre-installed fonts + Creative Cloud fonts) have the 匕 separated from the topmost horizontal stroke, some (even Adobe's own Kozuka fonts by a close margin) have 匕 touching the top horizontal line. Screenshot 2022-01-29 at 01 40 03

Which is why I think it would be better if we can unify the 死 component, both with 夕 (JP style) and 匕 touching the topmost horizontal stroke.

冘 component

See here.

今 component (TW/HK regions only)

See here. I would like to also map the JP form of the 今 characters to the TW/HK versions if possible, but some other characters which contain 今 will simply have to be adjusted because the other components (the traditional printing forms of JP) cannot be used for TW/HK.

For the time being, the characters to review will be this: 今仱吟坅妗岒忴扲汵昑肣枔㲐玪矜砛紟耹䑤蚙衿訡赺趻軡鈐䩂靲䪩䰼鹶黅黔霒䶃䶖㪁欦雂鳹㕂岑庈芩笒棽琴含念侌衾貪酓霠𩃬

Screenshot 2022-01-29 at 03 44 07

LEGEND: Green - v1 JP glyph to be restored (忴扲汵枔蚙赺*軡靲). Purple - v1 JP glyph that cannot be restored due to unsuitability for TW/HK use (肣紟霒欦雂庈霠). Cyan - v2 JP glyph to replace the TW/HK glyphs (今仱吟坅岒昑玪矜耹鈐鹶黅黔鳹棽琴含念). Orange - The TW/HK glyph to be replaced by the v1 restored JP glyph (in green) and deleted (also map the restored glyphs to JP and KR). Red - The TW/HK glyph to be replaced by the v2 JP glyph (in cyan) and deleted. Yellow - Adjust TW/HK glyph to match JP aesthetics (妗肣紟衿霒雂岑庈芩笒衾酓霠). Magenta - Replace glyph currently mapped to CN with TW/HK forms for JP/KR locales (雂庈).

*However, I am not entirely sure whether the v1 JP glyph for 赺 is suitable for TW/HK as the second stroke of 今 is different.

侖 component

See here.

竹 component (TW/HK regions only)

See here. Apparently this was not addressed when v2 rolled around, and I suppose the suggestion was rejected, so I will bring this up again.

While I cannot suggest changes to resemble another commercial typeface, for the sake of unifying components, I have to give a reference (this is sans-serif, but my point still stands). This reference is Hiragino Sans CNS, which was updated to completely follow Taiwan MOE standards on macOS 12 Monterey after many years of being an incomplete typeface with a mix of Japanese and mainland China glyph shapes. Basically put, the 竹 component in Hiragino Sans CNS is exactly the same as the GB (China) and the Japanese version. Screenshot 2022-02-03 at 21 36 32

亙 component (TW/HK regions only)

See here, but this one is for Serif.

The characters to review are: 亙恆絚䱍揯㮓緪堩䱭

Screenshot 2022-02-03 at 20 35 26

I suggest removing these TW/HK glyphs and mapping them to CN (coloured in yellow): 亙 (HK only), 絚, 堩, 䱭

For components that are potentially controversial and might break the 新字形 or Taiwan/Hong Kong educational standard rules

辶 component

As our good friend said, but to summarise, unifying with the JP form (circled in green) can potentially save quite a bit of CIDs and it isn't gonna be much of a difference to the naked eye. However, while I can't describe it well, it appears that the Chinese standard (circled in red) wants the bottom left component (circled in the 道 character) to be different from the JP form, and may be part of the 新字形 rule, even the gap thingy (circled in the 途 character). Screenshot 2022-01-29 at 00 45 30 My apologies if my colour coding and circling is confusing. The green circles mean the JP aesthetics to unify to and the red circles mean the current CN aesthetic.

We already knew the suggestion was previously rejected, but today might be different with a new design team from Arphic.

八 component

The Song-style detail which I circled in the below pic, I don't think it's necessary. That could be unified with the JP style glyph without much of a fuss, but to be honest, it's not going to benefit much as there won't be a lot of characters that need sharing, especially the 公 part which the bottom part must be connected for the Chinese regions, similar to handwriting. Screenshot 2022-01-29 at 00 35 42

There's an unencoded glyph called uni516BuE0101-JP (CID 61995 as of v2.001) which could be the CN/HK glyph, but pretty unlikely to materialise. Screenshot 2022-01-29 at 00 38 25

雨 component

See here.

尤 component

I lumped it with 冘 in the same issue page, but the simplified Chinese fonts treat this a bit differently, with the last stroke 乚 more separated from 丿 and touching the horizontal line.

Screenshot 2022-01-29 at 01 54 28 From top to bottom (all Mac fonts): Kaiti SC, Heiti SC, Lantinghei SC, SimSong (not SimSun), Songti SC, Yuanti SC

Funny enough, the Kaiti font is a bit more similar to the JP/TW/HK form, and some characters from Heiti SC also follow that, in this case the characters are all marked in green.

I am not suggesting that we follow another font, but I am making sure we have a good reference of where general Chinese fonts go in terms of glyph shape and whether we can accept a bit of the Japanese-style glyph shape for the CN region without making too much of a fuss.

UPDATE 1 (2022-01-30?): Added list of characters to replace in the 今 component, some minor suggestion changes to the 死 component and other miscellaneous changes.

UPDATE 2 (2022-02-03): Added 竹 and 亙 components. Also changed my heading to consider TW/HK standards as well.

Marcus98T avatar Jan 28 '22 18:01 Marcus98T

I found a tool (in Chinese) that would enable me to easily have details on specific components, but the site is not HTTPS secure, so...

I apologize for not doing any colour coding. I really wanted to make the Adobe team's work easier and get my point across as quick as possible before the next release cycle, but I am very busy with work and I am not even sure if they would really work on it. So the pictures I uploaded will mostly be devoid of colour so there won't be specific characters to adjust, replace or remove.

夕 component

Admittedly, the 夕 component is quite huge in scope and an extremely minor detail to adjust. It might not even save a good amount of CIDs compared to Source Han Sans.

I would focus on the 名 and 多 and 死 components instead for a start.

As said before:

I think having the dot (丶) touching (or at least be very close to) the ク part should be OK for the CN region.

Basically some CN glyphs have the dot a bit further away from ク. So I want a bit more standardization to the JP form, even as I said again is a very minor detail to adjust.

Here's an example image of what I want to achieve for 夕 and 死 components (red means I do not like the form, green means I prefer the form): Screenshot 2022-02-21 at 17 33 53

名 component

The characters to review are:

Character Unicode
U+540D
U+4F72
U+369A
U+59F3
U+6D3A
U+8317
𠱷 U+20C77
U+3AE5
𤥁 U+24941
U+7733
U+94ED
U+4285
U+8A7A
U+9169
U+35EE
U+9298
𨿅 U+28FC5
Screenshot 2022-02-21 at 17 25 57

For 名, I suggest to adjust the CN glyph, the 夕 part, so the dot can touch the ク part, and in turn match the balance of the JP aesthetic. Screenshot 2022-01-29 at 00 50 45

多 component

The characters to review are:

Character Unicode
U+591A
U+3689
U+4F88
U+5376
U+964A
U+3756
U+3794
U+3845
U+3881
U+54C6
U+5791
U+5953
U+59FC
U+6040
U+62F8
U+8324
U+8FFB
U+3A7C
U+3DB4
U+43E7
U+6245
U+6818
U+7239
𤥀 U+24940
U+388B
U+41CB
U+5067
U+591F
U+5920
U+75D1
U+7735
U+79FB
U+88B3
U+368A
U+3DC7
U+5921
U+7FD7
U+86E5
U+88B2
U+4412
U+4854
U+55F2
U+8A83
U+8D8D
U+8DE2
U+3485
U+35EC
U+368B
U+368C
U+40CE
U+451F
U+4B37
U+5925
U+9279
𣻗 U+23ED7
U+42FE
U+4AC2
U+5926
U+71AA
U+4B88
U+6A60
U+90FA
U+7C03
U+368D
U+3FD0
U+8B3B
U+9EDF
U+4D59

In this picture, blue means the v1 JP glyphs which were removed (and cannot be restored) and grey means v1 does not have the glyph. Screenshot 2022-02-21 at 17 29 58 Screenshot 2022-02-21 at 17 30 11

In this case I wish to balance out the 多 to be more compatible across different regions, especially 多 (U+591A), 奓 (U+5953), 扅 (U+6245), 爹 (U+7239), 痑 (U+75D1), 袲 (U+88B2) and 橠 (U+6A60). That would not save glyphs, but at least the design should be a lot more consistent.

Screenshot 2022-02-21 at 17 51 36

死 component

The characters to review are:

Character Unicode
U+6B7B
U+3C5D
U+3638
U+5C4D
U+3C37
U+6BD9
U+585F
U+81F0
U+846C
U+85A7
U+85A8
U+6583
U+9AD2
Screenshot 2022-02-21 at 17 26 09

Marcus98T avatar Feb 21 '22 09:02 Marcus98T

I am not sure if we should unify the 祭 component, as I observed from v1 to v2, the designer decided to separate the 祭 component as the top right part (open loop for JP, closed loop for CN) is considered a regional difference. However, I personally think that "regional difference" is not very obvious and should be consolidated, either to the open JP form or the closed CN form. If the closed CN form is preferred, the JP glyphs must be adjusted, do not use the CN glyphs directly because the JP glyphs are better designed (with the exception of those characters that contain 宀, e.g. 察, which is an essential regional difference).

Screenshot 2023-03-03 at 15 27 43 Screenshot 2023-03-03 at 15 27 49

In cyan are the Adobe-Japan1 glyphs.

The characters in the list are: 祭傺䏅㗫摖憏暩漈瘵蔡穄縩磜際㡜察鰶𬶭攃櫒䌨礤聺嚓𩟔擦鑔镲檫䕓䃰

However, I found that 攃 (U+6503, marked in yellow) is still following the JP form. For urgency sake, please adjust it to be closed loop. In the long run, if they at Adobe decide to follow the closed CN form, then adjust the glyph. If they decide to follow the open JP form, then do not adjust the glyph. If they decide to keep the regional difference, then adjust the 攃 glyph still.

Here is a quote from v1 as a reference:

image

The CN glyph for characters containing "祭" as component are inconsistent. Sometimes they are open (in red) and sometimes they are closed (green). Sometimes they share glyphs with JP (always open), sometimes they do not.

Marcus98T avatar Mar 03 '23 07:03 Marcus98T

I reported a similar issue in Sans a few months ago, here is the quote:

Components with 心 and 必 are inconsistent in itself for JP, CN, TW and HK. Maybe we should just use JP aesthetics for all regions (except for 心 itself)? The circled parts in the examples below show the discrepancies between the JP and the CN forms. Examples: 思, 悲, 恋, 志, 泌, 秘 Screenshot 2023-01-10 at 23 49 23

So for Serif here, the 心 and 必 components should consider merging to the JP form for all regions.

In fact while moving from v1 to v2, some glyphs got merged to the CN form (e.g. 擾 U+64FE), while others got merged to the JP form (e.g. 憂 U+6182), resulting in minor inconsistencies. So if possible, restore some JP glyphs to replace the CN glyphs, otherwise, for radicals and components that is considered an essential regional difference, like 言, redesign the CN glyphs to match the JP glyphs. The redesigning bit will also apply to Sans by the way. I will not specify as there are too many glyphs to count for now.

Screenshot 2023-03-16 at 11 13 26

Marcus98T avatar Mar 16 '23 03:03 Marcus98T

In addition to 侖 which I mentioned before, the 冊 (which also includes 扁 and others) and 而 components should also be looked at, and consider adjusting/restoring v1 to match JP glyphs wherever needed. I do not have the time to list every single JP glyph to be restored, but here are some examples that are in Big5 Level 1:

嗣 (U+55E3), 而 (U+800C), 耐 (U+8010)

Screenshot 2023-04-02 at 18 42 53

Marcus98T avatar Apr 02 '23 10:04 Marcus98T