source-han-sans Consolidation of CJK component unification across different regions to reduce unnecessary characters

Consolidation of CJK component unification across different regions to reduce unnecessary characters

Open Marcus98T opened this issue 2 years ago • 2 comments

This page is under construction as I have posted this too early without checking. But for now I have made sufficient edits that will remove references to Serif and focus on Sans mostly.

This page is created to consolidate all my issues for every single component that may need unification, without opening too many separate issues. This one is for Source Han Sans, so my unification objectives will be different compared to Serif.

Well we have people reporting glyphs that do not conform to China's 新字形 rule, because I think the glyphs are choke full having to cater to different national standards, so only the most common characters (including some traditional Chinese characters) would get the 新字形 treatment and others would have the JP/TW/HK look because they are rare characters that no practical person in China would be using. By unifying some components as they pointed out earlier, there can be more room for CN-style glyphs for rare characters.

For the time being, any components I missed in the initial edit is in a quote from the above link. I will update this page to include them in the future.

If there are not enough slots left, maybe we should consider merging some non-essential regional differences like ⻌, ⺮, 䒑 (as in 豆), 𠂆 between CN and JP? And we can also keep only the Japanese style of Japanese dingbats like ㍿. I believe those have been suggested in other issues.

So, to summarise from the Serif page:

Over time, I will start nitpicking more details which I deem unnecessary and can be shared with as many regions as possible, without breaking the respective national government glyph standard rules too much. My general preference is towards JP-style glyph shapes with some exceptions. The purpose is to ensure that by unifying components, we can keep it under the 65536 glyph limit and have more room to add necessary glyphs in the future.

I am very well aware that some of the unifications presented here might have been discussed before (and which I didn't have the time to go through thoroughly) and could be rejected, but I am going to keep them because I would like a confirmation on whether these unifications can go ahead under the new design team from Arphic.

For now it's a very quick summary, after Chinese New Year, I will update with a (perhaps incomplete) list of affected characters to fix for each affected component, if it isn't too much to handle. Unfortunately I may not be able to provide a complete list as finding the affected component in every character is very time consuming and there are currently no tools to do that instantly.

This page will be updated regularly if I can.

For components that I believe are safe to unify without breaking the 新字形 or Taiwan/Hong Kong educational standard rules

夕 component

There are two ways to unify this: The JP/TW/HK way and the CN way.

JP/TW/HK way

If this is to be followed, I think having the dot (丶) touching (or at least be very close to) the ク part should be OK for the CN region.

As per the example image (will be updated soon for more characters), currently, the JP version of 多 and 移 is mapped to all regions except for CN (marked yellow). Screenshot 2022-02-03 at 20 50 22 On another note, 夕 could be mapped to TW/HK forms for the JP/KR regions for consistency sake, however, that might not be an option as this will go against Japanese standards (probably there are exceptions to the rule).

For 名, I suggest to adjust the CN glyph, the 夕 part, so the dot can touch the ク part, and in turn match the balance of the JP aesthetic. Then the TW/HK region can be mapped to the adjusted CN. Screenshot 2022-02-03 at 20 54 12

CN way (update)

I realised there's another problem: Most commercial Japanese typefaces seem to follow the CN form (except for the ones marked yellow). Screenshot 2022-02-03 at 22 11 47

In light of this, I suggest another way to unify the 夕 component: Adjust the JP/KR/TW/HK glyphs to match the CN form, and then remove the redundant CN glyphs.

Alternatively, the CN forms can be mapped to JP/KR/TW/HK whenever possible, and then remove those redundant glyphs. For those which components cannot be unified (e.g. 言 and 糸 component), adjustments would have to be made to the 夕 part to match the CN form.

But I think it would take a lot of trouble to do that, plus we probably need to consider if the TW/HK standard can allow for the CN form (BTW Hiragino Sans CNS and Pingfang TC/HK have the CN forms).

For the 死 component, it's only a matter of unifying the 夕 part, so:

Map JP-style glyph to CN region.
Map CN-style glyph to JP/KR/HK regions, and then adjust the TW form to match the CN part of 夕.

Example: U+6B7B itself

Map uni6B7B-JP to CN, and then remove uni6B7B-CN.
Map uni6B7B-CN to JP/KR/HK, remove uni6B7B-JP and adjust uni6B7B-TW to match the CN part of 夕.

冘 and 尤 components

See here.

竹 component

See here. Apparently this was not addressed even till today, and I suppose the suggestion was rejected, so I will bring this up again.

While I cannot suggest changes to resemble another commercial typeface, for the sake of unifying components, I have to give a reference. This reference is Hiragino Sans CNS, which was updated to completely follow Taiwan MOE standards on macOS 12 Monterey after many years of being an incomplete typeface with a mix of Japanese and mainland China glyph shapes. Basically put, the 竹 component in Hiragino Sans CNS is exactly the same as the GB (China) and the Japanese version. Screenshot 2022-02-03 at 21 36 32

亙 component (TW/HK regions only)

See here.

Miscellaneous

花 (U+82B1)

花 (U+82B1) would need have the KR form and CN form (marked in red) merged. Either:

Remove the CN glyph and assign uni82B1uE0101-JP to the CN locale.
Remove uni82B1uE0101-JP and assign uni82B1-CN glyph to the KR locale and that alternate JP glyph IVS thing in Adobe-Japan1, and then adjust the JP, TW and HK glyphs to match the form and balance of uni82B1-CN.

For components that are potentially controversial and might break the 新字形 or Taiwan/Hong Kong educational standard rules

辶 component

See here.

人 component (when it's the top component)

Well I can't say any further, because this was rejected already, and Dr Lunde said it would be dangerous to suggest changes that would make Source Han Sans CN look like an existing commercial typeface, but with a different design team, we'll never know for quite a long time.

今 component (TW/HK regions only)

The roof of 人 would have to be unified in order for this to happen. For now I would not touch this.

Changelog

UPDATE 1 (2022-02-06): Added 花 (U+82B1) under Miscellaneous and clarified wording about unifying 死 component

Feb 03 '22 12:02 Marcus98T

I have made some amendments, so if anyone have the initial unfinished version via email and saved it internally (I am not sure how people save their issues and whether revisions are taken into account), please see this page on Github for the most up to date version, and save it instead to replace that unfinished version.

Feb 03 '22 14:02 Marcus98T

I found a tool (in Chinese) that would enable me to easily have details on specific components, but the site is not HTTPS secure, so...

I apologize for not doing any colour coding. I really wanted to make the Adobe team's work easier and get my point across as quick as possible before the next release cycle, but I am very busy with work and I am not even sure if they would really work on it. So the pictures I uploaded will mostly be devoid of colour so there won't be specific characters to adjust, replace or remove.

夕 component

If it can be unified to one form as per the following options, it will save quite an amount of CIDs to make way for new glyph additions.

Option 1: JP/TW/HK form

Option 2: CN form

List may still not be exhaustive.

The characters to review are:

Character	Unicode
夕	U+5915
歹	U+6B79
㐴	U+3434
㒱	U+34B1
㚈	U+3688
外	U+5916
夗	U+5917
夘	U+5918
歺	U+6B7A
邜	U+909C
名	U+540D
夙	U+5919
多	U+591A
夛	U+591B
岁	U+5C81
汐	U+6C50
舛	U+821B
芕	U+8295
㔰	U+3530
㱑	U+3C51
㶤	U+3DA4
斘	U+6598
𣏐	U+233D0
㑉	U+3449
㑕	U+3455
㚉	U+3689
㠾	U+383E
佲	U+4F72
侈	U+4F88
刿	U+523F
卶	U+5376
夝	U+591D
妴	U+59B4
宛	U+5B9B
矽	U+77FD
穸	U+7A78
罗	U+7F57
苑	U+82D1
茒	U+8312
迯	U+8FEF
陊	U+964A
𠧧	U+209E7
𠰻	U+20C3B
㘶	U+3636
㚚	U+369A
㝖	U+3756
㞔	U+3794
㡅	U+3845
㢁	U+3881
㢷	U+38B7
㱔	U+3C54
㼝	U+3F1D
哆	U+54C6
哕	U+54D5
垑	U+5791
奓	U+5953
奖	U+5956
姳	U+59F3
姼	U+59FC
怨	U+6028
恀	U+6040
拶	U+62F6
拸	U+62F8
栁	U+6801
桞	U+685E
洬	U+6D2C
洺	U+6D3A
茗	U+8317
茤	U+8324
荈	U+8348
迻	U+8FFB
𠱷	U+20C77
𡶷	U+21DB7
𢪸	U+22AB8
㑩	U+3469
㤪	U+392A
㩼	U+3A7C
㫥	U+3AE5
㶴	U+3DB4
㽜	U+3F5C
䀤	U+4024
䍃	U+4343
䏑	U+43D1
䏧	U+43E7
倇	U+5007
剜	U+525C
夞	U+591E
扅	U+6245
曻	U+66FB
栘	U+6818
桀	U+6840
桚	U+685A
桝	U+685D
桨	U+6868
浆	U+6D46
爹	U+7239
珟	U+73DF
盌	U+76CC
眢	U+7722
鸳	U+9E33
𡖔	U+21594
𡖖	U+21596
𤥀	U+24940
𤥁	U+24941
𧥧	U+27967
㓘	U+34D8
㢋	U+388B
䇋	U+41CB
䇟	U+41DF
䖤	U+45A4
䚻	U+46BB
偧	U+5067
啘	U+5558
啰	U+5570
埦	U+57E6
够	U+591F
夠	U+5920
婉	U+5A49
帵	U+5E35
惋	U+60CB
捥	U+6365
梦	U+68A6
涴	U+6DB4
猡	U+7321
痑	U+75D1
眳	U+7733
眵	U+7735
移	U+79FB
秽	U+79FD
菀	U+83C0
萝	U+841D
袳	U+88B3
逻	U+903B
釸	U+91F8
铭	U+94ED
㚊	U+368A
㱧	U+3C67
㷇	U+3DC7
䊅	U+4285
䛄	U+46C4
傑	U+5091
夡	U+5921
惌	U+60CC
晼	U+667C
椀	U+6900
椉	U+6909
椤	U+6924
焥	U+7125
琬	U+742C
粦	U+7CA6
翗	U+7FD7
翙	U+7FD9
腕	U+8155
舜	U+821C
葾	U+847E
蛥	U+86E5
袲	U+88B2
飧	U+98E7
𡟃	U+217C3
𣃽	U+230FD
𣸱	U+23E31
䐒	U+4412
䘼	U+463C
䡔	U+4854
䢣	U+48A3
亃	U+4E83
嗲	U+55F2
夢	U+5922
嵥	U+5D65
搩	U+6429
摉	U+6449
滐	U+6ED0
畹	U+7579
睕	U+7755
碗	U+7897
詺	U+8A7A
誃	U+8A83
趍	U+8D8D
跢	U+8DE2
酩	U+9169
酱	U+9171
锣	U+9523
鹓	U+9E53
𠹳	U+20E73
𡩣	U+21A63
𤾂	U+24F82
㒅	U+3485
㔂	U+3502
㗬	U+35EC
㗮	U+35EE
㚋	U+368B
㚌	U+368C
㷠	U+3DE0
㻧	U+3EE7
䃎	U+40CE
䑝	U+445D
䑱	U+4471
䔟	U+451F
䗕	U+45D5
䬷	U+4B37
僢	U+50E2
僯	U+50EF
僲	U+50F2
夣	U+5923
夤	U+5924
夥	U+5925
榤	U+69A4
箢	U+7BA2
箩	U+7BA9
粼	U+7CBC
綩	U+7DA9
舞	U+821E
蜿	U+873F
鄰	U+9130
鉹	U+9279
銘	U+9298
锵	U+9535
隣	U+96A3

Already the JP version shows a lot of inconsistency. Yellow shows the CN form of 夕.

名 component

The characters to review are:

Character	Unicode
名	U+540D
佲	U+4F72
㚚	U+369A
姳	U+59F3
洺	U+6D3A
茗	U+8317
𠱷	U+20C77
㫥	U+3AE5
𤥁	U+24941
眳	U+7733
铭	U+94ED
䊅	U+4285
詺	U+8A7A
酩	U+9169
㗮	U+35EE
銘	U+9298
𨿅	U+28FC5

Green means v1 JP glyph that can possibly be restored (if the JP/TW/HK form is chosen). Screenshot 2022-02-21 at 19 17 30

As said before, 名 (U+540D) is a special case. If the JP/TW/HK form is chosen, follow this below quote:

For 名, I suggest to adjust the CN glyph, the 夕 part, so the dot can touch the ク part, and in turn match the balance of the JP aesthetic. Then the TW/HK region can be mapped to the adjusted CN.

Otherwise adjust the JP glyph of 名 to the CN form of 夕.

多 component

The characters to review are:

Character	Unicode
多	U+591A
㚉	U+3689
侈	U+4F88
卶	U+5376
陊	U+964A
㝖	U+3756
㞔	U+3794
㡅	U+3845
㢁	U+3881
哆	U+54C6
垑	U+5791
奓	U+5953
姼	U+59FC
恀	U+6040
拸	U+62F8
茤	U+8324
迻	U+8FFB
㩼	U+3A7C
㶴	U+3DB4
䏧	U+43E7
扅	U+6245
栘	U+6818
爹	U+7239
𡖔	U+21594
𤥀	U+24940
㢋	U+388B
䇋	U+41CB
偧	U+5067
够	U+591F
夠	U+5920
痑	U+75D1
眵	U+7735
移	U+79FB
袳	U+88B3
㚊	U+368A
㷇	U+3DC7
夡	U+5921
翗	U+7FD7
蛥	U+86E5
袲	U+88B2
𣃽	U+230FD
䐒	U+4412
䡔	U+4854
嗲	U+55F2
誃	U+8A83
趍	U+8D8D
跢	U+8DE2
㒅	U+3485
㗬	U+35EC
㚋	U+368B
㚌	U+368C
䃎	U+40CE
䔟	U+451F
䬷	U+4B37
夥	U+5925
鉹	U+9279
𣻗	U+23ED7
䋾	U+42FE
䫂	U+4AC2
夦	U+5926
熪	U+71AA
䮈	U+4B88
橠	U+6A60
郺	U+90FA
簃	U+7C03
㚍	U+368D
㿐	U+3FD0
謻	U+8B3B
黟	U+9EDF
䵙	U+4D59

In this picture, blue means the v1 JP glyphs which were removed (and cannot be restored) and grey means v1 does not have the glyph. Screenshot 2022-02-21 at 18 49 11

死 component

The characters to review are:

Character	Unicode
死	U+6B7B
㱝	U+3C5D
㘸	U+3638
屍	U+5C4D
㰷	U+3C37
毙	U+6BD9
塟	U+585F
臰	U+81F0
葬	U+846C
薧	U+85A7
薨	U+85A8
斃	U+6583
髒	U+9AD2

Blue means v1 JP glyph that cannot be restored. Screenshot 2022-02-21 at 19 17 41

Feb 21 '22 11:02 Marcus98T

I am not sure if we should unify the 祭 component for Source Han Sans, as I observed, the top right part of 祭 (open loop for JP, closed loop for CN) is considered a regional difference. However, I personally think that "regional difference" is not very obvious and should be consolidated, either to the open JP form or the closed CN form. If the closed CN form is preferred, the JP glyphs must be adjusted, do not use the CN glyphs directly because the JP glyphs are better designed (with the exception of those characters that contain 宀, e.g. 察, which is an essential regional difference).

In cyan are the Adobe-Japan1 glyphs.

The characters in the list are: 祭傺䏅㗫摖憏暩漈瘵蔡穄縩磜際㡜察鰶𬶭攃櫒䌨礤聺嚓𩟔擦鑔镲檫䕓䃰

And please adjust 聺 (U+807A, marked in yellow, below image) to be more consistent with the other characters as well.

Mar 03 '23 08:03 Marcus98T

While I earlier said that 女, 弓 and 廴 in JP/KR should have the feet removed and follow CN for all regions, and this may not happen because it can be controversial, I will put here for organizational purposes and I will edit out that part in my original issue. If the design team believes that the feet is essential as a regional difference, do not consolidate.

Here is the original quote (and slightly edited with a new picture):

This issue only pertains to Source Han Sans and does not apply to Serif due to different design principles. While this may be a debatable issue, I think maybe the feet in JP and KR should be removed in radicals and components such as 女, 弓, 廴 and adopt the CN form. This way, all the other hyogaiji kanji outside of Adobe-Japan1-6 would look more consistent in comparison if this suggestion is implemented.

My rationale at that time was because I thought some Japanese fonts have the feet removed for UD (Universal Design) purposes (although it may not be applied to all the characters).

Here is the quote:

However, if the 廴 component cannot be merged because it will compromise the region-differentiating designs, then the 女 and 弓 components would still have to be merged. Some Japanese UD fonts (like UD Shin Go) and Hiragino Sans (with the exception of the characters on their own) also have these feet removed from the latter components as well.

Also, I originally asked to redesign 廴 back to v1, but I have retracted this request because as quoted by @NightFurySL2001:

Strongly advocate against such design-breaking decisions. If that is the case, it should be CN that should be changing as there is no strict rules on these parts.

For CN, TW and HK, maybe I suggest to once again redesign the 辶 component (TW and HK only) and the 廴 component so the latter can be shared across all regions, as seen in the picture for the "new" 廴 radical above.

Refer to the CN glyph in the font. No redesign required.

This is still design-breaking changes. Note the top right corner has the same angled corner and the current design matches that corner. Changing both corners requires changing all 折 designs (including but not limited to 又夂夕女[on left]也矛甬糹东车经).

EDIT: Edited last quote in this post for clarity.

Mar 17 '23 06:03 Marcus98T

Also, I am recently aware that there was a request to disunify 豆 and 立 for TW and HK use, which was posted 1.5 years ago as of this writing, however, it should not be in the page where the original issue is unifying the 辶 component between CN and JP, because it's not relevant, so here is the quote:

P/S: If this is to be viewed as a regional difference, then the requirement of two dots in 立 and 豆 to be connected to the top and bottom stroke should be implemented for TW and HK as this is a written requirement in Taiwan's 國字標準字體教師手冊, unlike this differentiation of 辶 component which is not specified anywhere in any of China's official document.

立: https://language.moe.gov.tw/001/Upload/files/SITE_CONTENT/M0001/STD/p173.htm?open

豆: https://language.moe.gov.tw/001/Upload/files/SITE_CONTENT/M0001/STD/p203.htm?open

Originally posted by @NightFurySL2001 in https://github.com/adobe-fonts/source-han-sans/issues/300#issuecomment-948327003

So here is my viewpoint (keeping relevant to this thread): I strongly suggest against this because I do not believe that strictly following the 台標 standard would be of any benefit and would make readability even uglier for Traditional Chinese users, and could create more CIDs than reducing them. Even a commercial 台標 font, like Hiragino Sans CNS (for TW/HK use), follows the same glyphs as Hiragino Sans GB (for CN use, as pictured below).

This is also why I will not retract my request that the top 竹 radical be unified to the JP form for all regions, although it would probably not happen for v3.

I do not think the 台標 standard is a strict guideline, and I am not saying Adobe should stop supporting 台標, because most other components will have to follow that standard, like the 肉月 radical in 腔, 脞, 胞, etc., but rather the designers should balance aesthetics and standards.

Mar 20 '23 17:03 Marcus98T

source-han-sans source-han-sans copied to clipboard

Consolidation of CJK component unification across different regions to reduce unnecessary characters

For components that I believe are safe to unify without breaking the 新字形 or Taiwan/Hong Kong educational standard rules

夕 component

JP/TW/HK way

CN way (update)

冘 and 尤 components

竹 component

亙 component (TW/HK regions only)

Miscellaneous

花 (U+82B1)

For components that are potentially controversial and might break the 新字形 or Taiwan/Hong Kong educational standard rules

辶 component

人 component (when it's the top component)

今 component (TW/HK regions only)

Changelog

夕 component

名 component

多 component

死 component

source-han-sans
source-han-sans copied to clipboard