zero-epwing
zero-epwing copied to clipboard
how to convert {{w_xxxxx}} and {{n_xxxxx}} to unicode
According to 外字Unicodeマップ http://ebstudio.info/manual/EBWin4_man/0_4_5.html
map file content looks like hA121 u00E0
,there is no 'w' or 'n'
Those are indices into the character map for the given dictionary. Yomichan-Import has code to parse these entries, you can check it out here: https://github.com/FooSoft/yomichan-import/blob/master/epwing.go#L172
Character tables have to be created for every EPWING dictionary, since certain 外字 have glyphs that would normally be rendered inside the text.
Character tables have to be created for every EPWING dictionary
Is that what you mean by a character table?
zA577 u95BD # 閽
zA578 u8772 # 蝲
zA579 u6A1D # 樝
zA57B u95AB # 閫
zA57C u95D0 # 闐
zA57D u9F97 # 龗
zA57E u5B7D # 孽
zA621 u97DB # 韛
zA622 u65F0 # 旰
zA623 u74EB # 瓫
Because if that's the case, installing EBWin4 and browsing to C:\Users\username\AppData\Roaming\EBWin4\GAIJI gives you a lot of tables. There's a table for kojien, wadai, meikyou, daijirin,...
@FooSoft Noticed that OCR of 外字 for several main dictionaries are done in yomichan-import. Would you mind to kindly suggest or share how the OCR can be done in batch? I would like to contribute to the repo but get stuck in the OCR part...
data:image/s3,"s3://crabby-images/1399b/1399b9e35c6db6377c7181195e68652565c909fa" alt="issue-ocr"
Thanks in advance!