pdfreader
pdfreader copied to clipboard
Failure to extract image as Pillow image ("Not enough image data")
I can't upload the python script as a .py
file, so I tacked on a .txt
extension. Running the script as follows produces the traceback shown below when running with the indicated file.
- test script: test.py
- sample PDF file: pdfreader-fail-1.pdf
$ ./test.py pdfreader-fail-1.pdf
ERROR:root:Skipping broken stream
Traceback (most recent call last):
File "/home/rhlisch/.local/lib/python3.7/site-packages/pdfreader/filters/lzw.py", line 29, in decode
data = decompress(data)
File "/home/rhlisch/.local/lib/python3.7/site-packages/pdfreader/filters/lzw.py", line 44, in decompress
return decoder.decodefrombytes(compressed_bytes)
File "/home/rhlisch/.local/lib/python3.7/site-packages/pdfreader/filters/lzw.py", line 72, in decodefrombytes
clearbytes = self._decoder.decode(codepoints)
File "/home/rhlisch/.local/lib/python3.7/site-packages/pdfreader/filters/lzw.py", line 199, in decode
decoded += self._decode_codepoint(cp)
File "/home/rhlisch/.local/lib/python3.7/site-packages/pdfreader/filters/lzw.py", line 227, in _decode_codepoint
raise ValueError("End of information code not supported directly by this Decoder")
ValueError: End of information code not supported directly by this Decoder
Traceback (most recent call last):
File "./test.py", line 9, in <module>
image = viewer.canvas.images[name].to_Pillow()
File "/home/rhlisch/.local/lib/python3.7/site-packages/pdfreader/pillow.py", line 82, in to_Pillow
img = Image.frombytes(cs, size, bytes(self.filtered))
File "/home/rhlisch/.local/lib/python3.7/site-packages/PIL/Image.py", line 2843, in frombytes
im.frombytes(data, decoder_name, args)
File "/home/rhlisch/.local/lib/python3.7/site-packages/PIL/Image.py", line 798, in frombytes
raise ValueError("not enough image data")
ValueError: not enough image data
@lisch The image is LZW-encoded and LZW decoder fails here https://github.com/maxpmaxp/pdfreader/blob/30818a2083b22624310fa83eb0101aefea60741c/pdfreader/filters/lzw.py#L227
Need to add support for END_OF_INFO_CODE
symbol. Feel free to contribute :)