ingest-file
ingest-file copied to clipboard
Adopt pdfminer.six
This PR might never get merged, but it explores the outcome of adopting PDFminer, following the discussion in #42.
So far:
- Lots of problems loading images correctly (might want to submit a PR to upstream to make https://github.com/pdfminer/pdfminer.six/blob/develop/pdfminer/image.py#L72 available for inline use).
- Good performance
cf. https://github.com/jorisschellekens/ptext-release/blob/master/ptext/io/read/image/read_jbig2_image_transformer.py