ibus-table
ibus-table copied to clipboard
Remove CJK compatible characters
For example, both U+88cf and U+f9e7 are present in the database, sharing the same glyph and wubi encoding. The user can never differentiate them visually.
The latter (U+f9e7) belongs to the CJK Compatibility Ideographs and thus should not appear in the database of Chinese characters.
There are probably other CJK Compatible characters that should be removed as well.
Do you mean wubi-jidian86.txt or wubi-haifeng86.txt or both?
Both tables seem to have this problem:
U+88CF:
$ grep 裏 tables/wubi-jidian/*.txt tables/wubi-haifeng/*.txt
tables/wubi-jidian/wubi-jidian86.txt:yjfe 裏 85400000
tables/wubi-haifeng/wubi-haifeng86.txt:yjfe 裏 1000
U+F9E7:
$ grep 裏 tables/wubi-jidian/*.txt tables/wubi-haifeng/*.txt
tables/wubi-jidian/wubi-jidian86.txt:yjfe 裏 12000000
tables/wubi-haifeng/wubi-haifeng86.txt:hgje 裏. 100
tables/wubi-haifeng/wubi-haifeng86.txt:yjfe 裏 999
tables/wubi-haifeng/wubi-haifeng86.txt:yjfe 裏. 100
Do you know why the wubi-haifeng86.txt file has so many entries where the characters are followed by a . U+002E FULL STOP?
Should I delete all lines which contain CJK Compatibility Ideographs from both tables?
In case of wubi-haifeng86.txt, this would mean deleting
tables/wubi-haifeng/wubi-haifeng86.txt:hgje 裏. 100
tables/wubi-haifeng/wubi-haifeng86.txt:yjfe 裏 999
tables/wubi-haifeng/wubi-haifeng86.txt:yjfe 裏. 100
(and all the other lines containing CJK Compatibility Ideographs).
Is that what I should do?
Should I delete all lines which contain CJK Compatibility Ideographs from both tables?
In case of wubi-haifeng86.txt, this would mean deleting ... (and all the other lines containing CJK Compatibility Ideographs). Is that what I should do?
Yes, I believe so.