pyxpdf icon indicating copy to clipboard operation
pyxpdf copied to clipboard

Error on Chinese Characters text fetching!!! any idea to resolve it?

Open marcusau opened this issue 3 years ago • 1 comments

Hi , I am using the module to fetch texts (combination of both English and Chinese) from the pdf files, with the following error:

from pyxpdf import Document, Page, Config
from pyxpdf.xpdf import TextOutput, TextControl, page_iterator
with open(pdf_file, 'rb') as fp:
    doc = Document(fp)
for page in doc:
    res_box =page.find_text('Cornerstone', search_box=[0, 0, 400, 400], case_sensitive=True)
    if res_box:
        print(page.label,res_box)

results:

Syntax Error: Unknown character collection 'Adobe-CNS1'
278 (406.8096, 94.85200000000002, 465.46160000000003, 104.47700000000002)
Syntax Error: Unknown character collection 'Adobe-CNS1'
279 (69.6101, 103.50040000000014, 106.93410000000002, 109.62540000000014)
280 (230.7095, 348.65500000000003, 284.4775, 358.28000000000003)
Syntax Error: Unknown character collection 'Adobe-CNS1'
Syntax Error: Unknown character collection 'Adobe-CNS1'
Syntax Error: Unknown character collection 'Adobe-CNS1'

marcusau avatar Mar 22 '21 10:03 marcusau