PdfPig
PdfPig copied to clipboard
Chinese parsing garbled characters
We could not find the corresponding character with code 54992 in font KaiTi_GB2312.

This is the address of the file in question https://github.com/asural/attachment/blob/master/file/issue%23455.pdf
Fix checked in.
File: src\UglyToad.PdfPig\PdfFonts\Composite\Type0Font.cs Method: TryGetUnicode Use cmap to convert charcterCode to CID before using ucs2CMap to then convert CID to Unicode.
characterCode ----by CMAP---> CID ---ucs2Map---> Unicode
Page 1:
Text: 中航动力控制股份有限公司 2010年半年度报告 中航动力控制股份有限公司董事会 2010年8月17日
Charcode conversion for each letter below: charcode,cid,unicode,char,0xc1a6,0x09ef,\u529b,力 charcode,cid,unicode,char,0xbfd8,0x0965,\u63a7,控 charcode,cid,unicode,char,0xd6c6,0x11c5,\u5236,制 charcode,cid,unicode,char,0xb9c9,0x0722,\u80a1,股 charcode,cid,unicode,char,0xb7dd,0x067a,\u4efd,份 charcode,cid,unicode,char,0xd3d0,0x10b5,\u6709,有 charcode,cid,unicode,char,0xcfde,0x0f4b,\u9650,限 charcode,cid,unicode,char,0xb9ab,0x0704,\u516c,公 charcode,cid,unicode,char,0xcbbe,0x0db3,\u53f8,司 charcode,cid,unicode,char,0xb6ad,0x05ec,\u8463,董 charcode,cid,unicode,char,0xcac2,0x0d59,\u4e8b,事 charcode,cid,unicode,char,0xbbe1,0x07f6,\u4f1a,会 charcode,cid,unicode,char,0xc4ea,0x0b4d,\u5e74,年 charcode,cid,unicode,char,0xd4c2,0x1105,\u6708,月 charcode,cid,unicode,char,0xc8d5,0x0cb0,\u65e5,日

Thank you, you are very responsible people @fnatzke @EliotJones