combine_pdf icon indicating copy to clipboard operation
combine_pdf copied to clipboard

PDF 1.5 Object streams found - they are not fully supported! attempting to extract objects.

Open ChunChunFan opened this issue 7 years ago • 8 comments

gem 'combine_pdf', :git => 'https://github.com/boazsegev/combine_pdf.git' when CombinePDF.load('file').pages[0] warning PDF 1.5 Object streams found - they are not fully supported! attempting to extract objects. Because this new version(March 27)

ChunChunFan avatar Apr 19 '17 07:04 ChunChunFan

This shouldn't be a version related warning... it should come up only for certain PDF files where parsing might be only partially supported (due to compression and encryption concerns), warning about the possibility of parsing errors.

Did you test the same file with previous versions?

Do you experience and actual issue with the result?

boazsegev avatar Apr 19 '17 07:04 boazsegev

I have come across PDFs which have object streams. PDFs with object-streams are definitely later than PDF 1.5 version, but not all PDF1.5 documents would have object streams in them.

I believe the object stream problem is an encryption problem. There are tools which could inflate a PDF with object streams into a PDF which doesn't use object streams. But this is not in CombinePDF yet.

igbanam avatar Jul 05 '17 11:07 igbanam

@igbanam , thanks for adding your knowledge to the discussion.

Could you send me and example PDF that doesn't work? one where you experience data-loss when opening using CombinePDF?

I'll be happy to try and track down the issue and see what I can do about it.

boazsegev avatar Jul 05 '17 14:07 boazsegev

I cannot release the PDF I found this out with—proprietary issues and all. But if I find another which I can share, I would attach it on here somehow.

igbanam avatar Jul 07 '17 19:07 igbanam

@igbanam thank you.

When you find one you can send, feel free to email me instead of posting on this issue, this way it won't be posted publicly (if that matters).

B.

boazsegev avatar Jul 07 '17 19:07 boazsegev

@igbanam , I have no idea if this might solve the issue, but I just released a version that includes improved support for Object Streams.

B.

boazsegev avatar Jul 07 '17 22:07 boazsegev

Hi - don't know if this helps, but I was seeing this warning using combine_pdf v1.03, but when I upgraded to v1.0.16 the warning went away. This is when parsing this UK government document: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/512167/LIT_6872.pdf In both cases, the warning had no visible impact i.e. the resultant PDF seemed okay.

MikeFlint avatar Aug 26 '19 15:08 MikeFlint

Hello, today I stumbled with this issue also but with: pdf = CombinePDF.parse Net::HTTP.get_response(URI.parse(url)).body

I solved it adding the "allow_optional_content: true" argument to the parse command like this: pdf = CombinePDF.parse( Net::HTTP.get_response(URI.parse(url)).body, allow_optional_content: true)

I am now able to combine PDF 1.5 Object streams

Hope this helps!

bustavo avatar Feb 17 '21 23:02 bustavo