pdfminer icon indicating copy to clipboard operation
pdfminer copied to clipboard

pdfminer is unable to parse pdf

Open cancan101 opened this issue 12 years ago • 4 comments

I am using the following pdf: http://www.ema.europa.eu/docs/en_GB/document_library/Committee_meeting_report/2009/10/WC500006409.pdf

parser = PDFParser(fp)
doc = PDFDocument(parser)
parser.set_document(doc)
doc.initialize()

In [14]: doc.is_extractable
Out[14]: False

cancan101 avatar Nov 26 '13 18:11 cancan101

This is by design. Some PDFs explicitly disallow to extract text, and PDFMiner follows the directive. You can override it (giving check_extractable=False), but do it at your own risk.

euske avatar Nov 27 '13 10:11 euske

Cool. That worked. It would be cool to add that as an option in pdf2txt.py

cancan101 avatar Nov 27 '13 18:11 cancan101

@euske, this issue can be closed, right?

jorgecarleitao avatar Nov 16 '14 14:11 jorgecarleitao

Hi @euske, i cant get this to work. Which of the above functions will take "check_extractable=False" argument.. All the 4 functions return "got an unexpected keyword argument 'check_extractable'". I am a novice, so please excuse any mistakes :)

nikhilbhawsinka avatar Feb 11 '19 04:02 nikhilbhawsinka