PdfPig icon indicating copy to clipboard operation
PdfPig copied to clipboard

Chinese parsing garbled characters

Open asural opened this issue 3 years ago • 1 comments

We could not find the corresponding character with code 54992 in font KaiTi_GB2312. image

asural avatar May 13 '22 09:05 asural

This is the address of the file in question https://github.com/asural/attachment/blob/master/file/issue%23455.pdf

asural avatar May 13 '22 09:05 asural

Fix checked in.

File: src\UglyToad.PdfPig\PdfFonts\Composite\Type0Font.cs Method: TryGetUnicode Use cmap to convert charcterCode to CID before using ucs2CMap to then convert CID to Unicode.

characterCode ----by CMAP---> CID ---ucs2Map---> Unicode

Page 1:

Text: 中航动力控制股份有限公司 2010年半年度报告 中航动力控制股份有限公司董事会 2010年8月17日

Charcode conversion for each letter below: charcode,cid,unicode,char,0xc1a6,0x09ef,\u529b,力 charcode,cid,unicode,char,0xbfd8,0x0965,\u63a7,控 charcode,cid,unicode,char,0xd6c6,0x11c5,\u5236,制 charcode,cid,unicode,char,0xb9c9,0x0722,\u80a1,股 charcode,cid,unicode,char,0xb7dd,0x067a,\u4efd,份 charcode,cid,unicode,char,0xd3d0,0x10b5,\u6709,有 charcode,cid,unicode,char,0xcfde,0x0f4b,\u9650,限 charcode,cid,unicode,char,0xb9ab,0x0704,\u516c,公 charcode,cid,unicode,char,0xcbbe,0x0db3,\u53f8,司 charcode,cid,unicode,char,0xb6ad,0x05ec,\u8463,董 charcode,cid,unicode,char,0xcac2,0x0d59,\u4e8b,事 charcode,cid,unicode,char,0xbbe1,0x07f6,\u4f1a,会 charcode,cid,unicode,char,0xc4ea,0x0b4d,\u5e74,年 charcode,cid,unicode,char,0xd4c2,0x1105,\u6708,月 charcode,cid,unicode,char,0xc8d5,0x0cb0,\u65e5,日

DebugWatchLettersAfterFix

fnatzke avatar Dec 23 '22 06:12 fnatzke

Thank you, you are very responsible people @fnatzke @EliotJones

asural avatar Feb 20 '23 05:02 asural