OpenCC icon indicating copy to clipboard operation
OpenCC copied to clipboard

`s2t` converts “背包” to “揹包,” but `t2s` or `tw2s` doesn’t do the opposite

Open NaitLee opened this issue 2 years ago • 3 comments

The definition of “揹”-included phrases (of s2t) are around here.

But there isn’t 揹 背 in TSCharacters.txt.

To reproduce:

$ echo "背包" | opencc -c s2t.json | opencc -c t2s.json
揹包

AFAIK there isn’t any usage of “揹” in Simplified Chinese. So please add “背” as a simplification :)

Notes: It seems that “揹” is just a variation of “背”, in/for both Traditional and Simplified Chinese. Both “揹” and “背” are seen in Web search results (of sites that use Traditional Chinese). So that both are correct, anyway.

NaitLee avatar Dec 16 '22 08:12 NaitLee

「揹」就是異體字,建議刪除

ayaka14732 avatar Jan 09 '23 03:01 ayaka14732

根據 OpenCC 「能分則不合」的原則,像「揹」這樣算是細分用法的字其實合乎邏輯。 據此處,「揹」算作傳統字。 但一些字典(如這裏)說爲異體字。據說《康熙字典》《說文解字》均未收錄此字。 「背」下部從「肉」,可指肩膀與後背,動詞上已經有「負荷」的含義。根據這邏輯可能「揹」要算異體。 從相關互聯網搜索(多爲港、臺網店商品)來看,「背包」和「揹包」都有使用。 具體作何決策還待專家考察 😄

不管怎樣,需要爲 t2s 添加此組合:簡體不使用「揹」,若出現則需要替換掉。

NaitLee avatar Jan 09 '23 07:01 NaitLee

《通用規範漢字表》中有列出「背」是規範字,「揹」是異體字。OpenCC 所謂的簡體字就是中國規範字,按此原則上應將《通用規範漢字表》中的異體字轉為規範字。

除了此字以外,還有一大堆可以按相同原則轉為規範字的異體字。我很久以前就在 #492 提過 PR,但當時老大說要再研議,不曉得目前考慮得如何了……。

danny0838 avatar Jan 09 '23 11:01 danny0838