pdfminer.six icon indicating copy to clipboard operation
pdfminer.six copied to clipboard

One-dimensional and mixed CCITT encoding not supported

Open dhdaines opened this issue 6 months ago • 1 comments

pdfminer.six only supports "pure two-dimensional" encoding in its CCITTFaxDecode filter implementation, as represented by a /K value less than zero in the filter dictionary.

At the very least it should also support the default mode of /K 0 for "Pure one-dimensional encoding".

See this PDF: https://github.com/mozilla/pdf.js/blob/master/test/pdfs/ccitt_EndOfBlock_false.pdf

dhdaines avatar Jun 26 '25 14:06 dhdaines

You can definitely simply take the new implementation (which is not 100% correct but is about 50% faster and will succeed in extracting images without crashing in all circumstances) from PLAYA-PDF: https://github.com/dhdaines/playa/blob/main/playa/ccitt.py

dhdaines avatar Jul 29 '25 14:07 dhdaines