pdfminer.six
pdfminer.six copied to clipboard
One-dimensional and mixed CCITT encoding not supported
pdfminer.six only supports "pure two-dimensional" encoding in its CCITTFaxDecode filter implementation, as represented by a /K value less than zero in the filter dictionary.
At the very least it should also support the default mode of /K 0 for "Pure one-dimensional encoding".
See this PDF: https://github.com/mozilla/pdf.js/blob/master/test/pdfs/ccitt_EndOfBlock_false.pdf
You can definitely simply take the new implementation (which is not 100% correct but is about 50% faster and will succeed in extracting images without crashing in all circumstances) from PLAYA-PDF: https://github.com/dhdaines/playa/blob/main/playa/ccitt.py