pdf-reader icon indicating copy to clipboard operation
pdf-reader copied to clipboard

MalformedPDFError Invalid filter algorithm 31

Open ollym opened this issue 3 years ago • 3 comments

PDF file: EA9DDBD4F46B6A41F4CFC7FE3A222FAF8013C3CEAC0918D1E2A5.pdf

There seems to be some issue with png_depredict function when running the code:

PDF::Reader.new(file).pages[0].xobjects[:I3].unfiltered_data

# => PDF::Reader::MalformedPDFError (Invalid filter algorithm 31):

That specific xobject is the QR Code which we're trying to extract and parse, but struggling to get the unfiltered_data necessary to do so. Will continue to try and debug but may need someone else's help

ollym avatar Oct 19 '22 13:10 ollym

The image xobject looks like this:

<</Type /XObject
/Subtype /Image
/Width 100
/Height 100
/ColorSpace [/Indexed /DeviceRGB 1 23 0 R]
/BitsPerComponent 1
/Filter /FlateDecode
/DecodeParms <</Predictor 15 /Colors 1 /BitsPerComponent 1 /Columns 100>>
/Length 265>>

I'm fairly sure it's accurate that 31 isn't a valid filter type in the PNG format, but I suspect the png_depredict isn't correctly parsing the data and it should be getting as far as thinking there's a filter type of 31. Maybe because it's a single bit per component? Or maybe because the colour space is indexed 🤔

Unfortunately I'm fairly swamped at the moment with day job and family life so I want be able to take a closer look for a while. Sorry!

yob avatar Oct 20 '22 13:10 yob

Ouch, this has reminded me that there's only a single unit spec for the Flate filter with PNG shaped data 😬

https://github.com/yob/pdf-reader/blob/946559b06a381ba2651fd037afc95a24309e94e4/spec/reader/filter/flate_spec.rb#L54-L71

yob avatar Oct 20 '22 13:10 yob

For those also having issues with this, we found HexaPDF was able to export the image correctly: https://github.com/gettalong/hexapdf

ollym avatar Oct 23 '22 12:10 ollym