content-extractor
content-extractor copied to clipboard
Extract meaningful content from pdf and psd file, such as texts and images both linked into a common JSON string
I encounter this problem, though the examples are processed successfully. File "general.py", line 12, in json = main.run("Programming with PDFMiner.pdf", "./images/") File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\main.py", line 82, in run dict_book = text_to_dict(pdf_file)...
I'm not sure how the original code could handle UTF-8 input files. Buffering characters in Unicode ensured I could convert mine, producing UTF-8 JSON output (io.cStringIO does accept Unicode while...
The euske / pdfminer repository has changed the location of the PDFDocument class, as noted in the README. This class can be refound easily, but also other things have changed...