content-extractor
content-extractor copied to clipboard
pdfminer has changed it's API and broken some links
The euske / pdfminer repository has changed the location of the PDFDocument class, as noted in the README. This class can be refound easily, but also other things have changed as can be deducted from the following error message. I will not pursue this any further and use pdfminer directly.
[..]/pdfsplitter/content_extractor/pdfreader/util/convert.py in <module>()
2 from pdfminer.pdfparser import PDFParser
3 from pdfminer.pdfdocument import PDFDocument
----> 4 from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter, process_pdf
5 from pdfminer.pdfdevice import PDFDevice, TagExtractor
6 from pdfminer.converter import XMLConverter, HTMLConverter, TextConverter
ImportError: cannot import name process_pdf
https://github.com/Micka33/content-extractor/pull/2 should fix this problem