pdfminer is unable to parse pdf
I am using the following pdf: http://www.ema.europa.eu/docs/en_GB/document_library/Committee_meeting_report/2009/10/WC500006409.pdf
parser = PDFParser(fp)
doc = PDFDocument(parser)
parser.set_document(doc)
doc.initialize()
In [14]: doc.is_extractable
Out[14]: False
This is by design. Some PDFs explicitly disallow to extract text, and PDFMiner follows the directive. You can override it (giving check_extractable=False), but do it at your own risk.
Cool. That worked. It would be cool to add that as an option in pdf2txt.py
@euske, this issue can be closed, right?
Hi @euske, i cant get this to work. Which of the above functions will take "check_extractable=False" argument.. All the 4 functions return "got an unexpected keyword argument 'check_extractable'". I am a novice, so please excuse any mistakes :)