source-han-sans icon indicating copy to clipboard operation
source-han-sans copied to clipboard

Remove the beginner's trap. Make it more clear that which font you need to choose for a language!

Open WingFoxie opened this issue 4 years ago • 12 comments

Problem description:

I've seen game developers trying to support simplified Chinese language. They downloaded this font from "releases", the SourceHanSans.ttc 117 MB file, installed it, and chose "Source Han Sans" in the font list, which is the WRONG one to choose!

When the font is installed, there is no such font named "Source Han Sans CN". The CN font is always named “思源黑体”. They should pick “思源黑体”. But how could they know?

In this github repository' front page. Every one sees a file list including: SourceHanSans_CN_sequences.txt README-CN.md UniSourceHanSansCN-UTF32-H UniSourceHanSansHWCN-UTF32-H Which indicates the simplified Chinese font should have "CN" in its font name. But that's NOT true!

The font name is always "思源黑体", no matter what Windows display language they use. (I verified this by using Windows Sandbox, which has region set to United States and language set to English.)

In your "readme.md" I see this: To help decide which fonts to download, please refer to the Configurations section of the official font readme file. But that file doesn't mention this clearly as well! In that readme file, text like "Source Han Sans SC" are mentioned everywhere. But "Source Han Sans SC" is NOT in the font list, once the user has installed the font. The only place which "思源黑体" is shown, is at page 20 of that official font readme file, but: 1st, it doesn't tell you that you should look for "思源黑体" in the font list. 2nd, this page isn't supposed to tell you that in the first place. For simplified Chinese you should look for "思源黑体" not "Source Han Sans SC" in the font list. But for Japanese, you actually look for "Source Han Sans", not "源ノ角ゴシック " in the font list!

When "Source Han Sans" is chosen, some characters like "包 画 复" is not displayed correctly. And some characters will fail to display, and will show a tofu sign instead.

If they choose to download the over 2GB "source.zip" instead, they can find a "SourceHanSansSC-Regular.otf" file in it. But when they install it, it still appears as "思源黑体" in the font list!! (Verified using Windows Sandbox, even the "SourceHanSansSC-Regular.otf" font, which has its font name displayed as "Source Han Sans SC" (not "思源黑体") in the font viewer. When this font is installed it will still show "思源黑体" in the font list, and it will NOT show as "Source Han Sans SC".)

Again How, could, they, know to choose "思源黑体" when they only see "Source Han Sans SC" everywhere else?

Edit: Tried Google's "Noto Sans CJK SC Regular" font and it doesn't have this problem, the installed font will show up as "Noto Sans CJK SC Regular", not some name that never showed up anywhere else.

WingFoxie avatar May 16 '20 19:05 WingFoxie

The 117 MB SourceHanSans.ttc file should contain ALL languages and weights supported of Source Han Sans/思源黑体. If some characters are not displayed properly, then it's the problem with the software, not the font itself. Ideally, all characters should display properly (no tofu) no matter which language variant you choose for Source Han Sans family.

Explorer09 avatar May 18 '20 01:05 Explorer09

Try Microsoft Word for example. If you paste "包 画 复" in there. And choose "Source Han Sans" from Word's font options. It will render differently then what you can see in this webpage. The character "包" will have the center "口" opened at the left. The character "画" will have the center "田" connected to the top "一". The character "复" will be a weird slimmer version.

Only if you choose "思源黑体" from the font options, it will render correctly.

They have to choose "思源黑体" instead of "Source Han Sans" for the characters to be rendered correctly. What's more, they can not choose "Source Han Sans SC" because there will be no such option in the list. (Assuming they installed the 117MB SourceHanSans.ttc, of course.)

WingFoxie avatar May 18 '20 01:05 WingFoxie

@WingFoxie Did you set the correct language for the Microsoft Word document?

Explorer09 avatar May 18 '20 01:05 Explorer09

I think the problem is with:

  1. TC/SC/HC/K version of Source Han are named like "Source Han Sans K" to indicate the regional variant they represent, but the Japanese variant was not denoted with anything similar.
  2. Installed font names are localized. So that for a Traditional Chinese users that have installed all the font variants, in the Source Han section of list of installed fonts, he can see Source Han HC/K/SC/etc. but not TC and have to scroll to Chinese font name section to see the TC font as the font name have been localized into Traditional Chinese. Likewise, for a SC user, he can see Source Han HC/K/TC/etc. in the Source Han XXX section of the font selection list but have to scroll to Chinese font name section to find the SC font as the font name have been localized into Simplified Chinese.

c933103 avatar May 18 '20 02:05 c933103

I think I know where the problem is. This "Source Han Sans" thing is for CJK characters. CJK means Chinese (Simplified and Traditional), Japanese, and Korean.

So the installed font names should be: SC: "Source Han Sans SC", TC: "Source Han Sans TC", JP: "Source Han Sans JP" KR: "Source Han Sans KR". (Google's Noto Sans is named this way: "Noto Sans CJK SC" "Noto Sans CJK TC" "Noto Sans CJK JP" and "Noto Sans CJK KR".)

But Adobe did this, their font names are: SC: 思源黑体 TC: 思源黑體 JP: Source Han Sans KR: Source Han Sans K

See where the problem is? The Japanese font is called "Source Han Sans", making it appears to be the "master" font of all these. So whoever chose this font to display Chinese characters, will end up displaying these characters in the Japanese way.


But... To their defense, if they use the Chinese font instead, they will be facing the opposite problem: some Japanese characters rendered in the Chinese way.

The root problem comes from the Chinese and Japanese languages. Japanese versions of "包 画 复" is different from Chinese versions. But since they share the same meaning and their source is the same, they have to use the same entry in the fonts. Buuuut, since Chinese and Japanese are developed seperately for hundreds of years. The way to write these characters aren't the same anymore.

So what actually happened is that, Adobe tried the make this easier for everyone, to be able to just choose "Source Han Sans", to show SC TC JP KR characters correctly. But they sort of failed.

Therefore, to achieve perfect results, user has to manually choose the corresponding font for the text. Choose "思源黑体" for Simplified Chinese texts, and choose "Source Han Sans" for Japanese texts, and so on.

WingFoxie avatar May 18 '20 02:05 WingFoxie

Meanwhile, Google's "Noto Sans" approach is that: They didn't try to make a "master" font, they knew it's impossible. And they named the fonts in a way to encourage the user to choose the language manually, every time. And they named all the fonts in english. So the English users will easily know what font to choose for SC text: The font with "SC" in its name, of course.

But for the Adobe's "Source Han Sans" font. There's no easy way for an English user to see that they should choose "思源黑体" for SC text. The benefit is that Chinese users can figure out which font to choose easily: The font with a Chinese name, of course.

WingFoxie avatar May 18 '20 02:05 WingFoxie

The problem is that name of installed font would change according to system language setting in addition to the non-labelling of Japanese font variant. IMG_20200518_104554

c933103 avatar May 18 '20 02:05 c933103

The problem is that name of installed font would change according to system language setting in addition to the non-labelling of Japanese font variant. IMG_20200518_104554

This sheet is NOT correct according to what I experienced. My user language is SC, and the name of the TC font is "思源黑體", not "Source Han Sans TC". Also I tried to install the font in Windows Sandbox, which has its language set to English. The name of TC font is still "思源黑體", the name of the SC font is still "思源黑体", the name of the JP font is still "Source Han Sans".

(By saying "the name", I mean after the font is installed, in any text editor like Microsoft Word, the name of the font shown in that list. Note that this name is NOT always the same as the name shown if you just open the font file and preview it. For example, the 16.4MB "SourceHanSansSC-Regular.otf", if you preview the font, it will tell you that the font name is "Source Han Sans SC", but when you install it, the name appeared will be "思源黑体" instead!)

WingFoxie avatar May 18 '20 03:05 WingFoxie

Actually, Source Han Sans is a Pan-CJK font, and one of the highlights of this font is that the glyphs of different regions are encapsulated into one single font resource. The idea is that you choose the font, tag it with the correct language identifier, and you get the desired result.

When Adobe first released Source Han Sans in 2014, there was only one single version of OTF, and the font name was "Source Han Sans". The default language for this font was Japanese. User had to tag the font in the application to access the glyph for other regions.

shs-tw

A problem for this approach is that software support is required, and software supporting this feature is fairly limited. Which is why "Language Specific OTFs" -- different set of font files with different regions as the default -- were released since version 1.001. The name of "Source Han Sans" isn't changed, mostly due to backward compatibility reason. (ref: https://github.com/adobe-fonts/source-han-serif/issues/50)

One thing worth nothing is that the Pan-CJK nature is still preserved in these Language Specific OTFs. Even though you've chosen "思源黑体" as the font, you can access the glyphs of other regions through language tagging. shs-locales

You can try this in MS Word. On Windows, you need some tricks to make it turn on the OpenType LOCL feature though:

  1. Choose "Source Han Sans". Japanese glyphs will be shown.
  2. Open the font dialog, click on the "Advanced" tab, and enable any one of the OpenType features (choose any value other than default/none.).
  3. Now you can access the glyph of different regions by selecting the text and changing it to another "language". word-localized

(ref: https://github.com/adobe-fonts/source-han-serif/issues/81)

So it isn't entirely "wrong" to use Source Han Sans. The point is whether the software supports language tagging, and whether it is correctly tagged to use the desired form.

Of course, choosing the Language-specific version via the font menu also works, which also saves the language-tagging step. But even you chose the correct version, the resultant glyph can still be "wrong" if you have inadvertently tagged it to another language.

tamcy avatar May 18 '20 03:05 tamcy

Actually, Source Han Sans is a Pan-CJK font, and one of the highlights of this font is that the glyphs of different regions are encapsulated into one single font resource. The idea is that you choose the font, tag it with the correct language identifier, and you get the desired result.

When Adobe first released Source Han Sans in 2014, there was only one single version of OTF, and the font name was "Source Han Sans". The default language for this font was Japanese. User had to tag the font in the application to access the glyph for other regions.

shs-tw

A problem for this approach is that software support is required, and software supporting this feature is fairly limited. Which is why "Language Specific OTFs" -- different set of font files with different regions as the default -- were released since version 1.001. The name of "Source Han Sans" isn't changed, mostly due to backward compatibility reason. (ref: adobe-fonts/source-han-serif#50)

One thing worth nothing is that the Pan-CJK nature is still preserved in these Language Specific OTFs. Even though you've chosen "思源黑体" as the font, you can access the glyphs of other regions through language tagging. shs-locales

You can try this in MS Word. On Windows, you need some tricks to make it turn on the OpenType LOCL feature though:

1. Choose "Source Han Sans". Japanese glyphs will be shown.

2. Open the font dialog, click on the "Advanced" tab, and enable any one of the OpenType features (choose any value other than default/none.).

3. Now you can access the glyph of different regions by selecting the text and changing it to another "language".
   ![word-localized](https://user-images.githubusercontent.com/959433/82170865-4549d780-98f8-11ea-93eb-47d7be3a04ad.png)

(ref: adobe-fonts/source-han-serif#81)

So it isn't entirely "wrong" to use Source Han Sans. The point is whether the software supports language tagging, and whether it is correctly tagged to use the desired form.

Of course, choosing the Language-specific version via the font menu also works, which also saves the language-tagging step. But even you chose the correct version, the resultant glyph can still be "wrong" if you have inadvertently tagged it to another language.

This is informative. Thanks a lot!

Now I can see why Adobe's approach is a bit stupid. This doesn't really solve the problem, just moved the problem from one place to another. Users still have to manually declare which text is in which language, for the text to show correctly.

On the other hand. This seems to be the right thing to do. In the future all text will have hidden information with it, telling the system what language the text is in. And coping any text also copies the language information with it. (That's exactly why the language information should be with the text, not with the font choice.) Only then will the problem be truly solved. And user can just choose "Source Han Sans" to make CJK characters all rendered correctly.

WingFoxie avatar May 18 '20 04:05 WingFoxie

  1. Can this be said as a bug that in Microsoft Office program, one need to manually enable opentype features before the program would take into account regional differences?
  2. So if one opted to manually change the setting of OpenType feature, is there no way to change the language variant of font being used in part of a, but not the entire, e.g. MS Word document, into Japanese variant?

c933103 avatar May 18 '20 20:05 c933103

1. Can this be said as a bug that in Microsoft Office program, one need to manually enable opentype features before the program would take into account regional differences?

2. So if one opted to manually change the setting of OpenType feature, is there no way to change the language variant of font being used in part of a, but not the entire, e.g. MS Word document, into Japanese variant?
  1. There's much more to do than just enabling the opentype features in Microsoft Office. The problem I complained about is mostly seen in games. Those games that support SC TC JP, they render some SC characters in the JP way, because when the program doesn't know that 复 should be a Chinese 复, not a Japanese 复. Game engines need to support this. Eventually everything with a text (from webpage to file names on your harddrive) should support this "language tagging" feature to make this work perfectly. But that could take forever.

  2. You can select a piece of text first, and then click the language shown in the bottom status bar, to tell Word "what language this piece of text is in". In this way you can define different languages for different parts of the document.

WingFoxie avatar May 18 '20 21:05 WingFoxie