react-pdf icon indicating copy to clipboard operation
react-pdf copied to clipboard

Fix invalid JPEG image

Open ooga opened this issue 1 year ago • 3 comments

Fix #2625

Before :

image
$ pdfcpu validate ./Downloads/document.pdf
validating(mode=relaxed) ./Downloads/document.pdf ...
dereferenceAndLoad: problem dereferencing object 12: strconv.ParseFloat: parsing "undefined": invalid syntax

After :

image
$ pdfcpu validate ./Downloads/document.pdf
validating(mode=relaxed) ./Downloads/document.pdf ...
validation ok

ooga avatar Feb 16 '24 09:02 ooga

⚠️ No Changeset found

Latest commit: f1ce343c2274e62f59b86148c5dfbb76f25b9d6b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

changeset-bot[bot] avatar Feb 16 '24 09:02 changeset-bot[bot]

thanks dude!

KMJ-007 avatar Feb 21 '24 11:02 KMJ-007

Is there any reason why this is still unmerged? 👀

niemisami avatar Feb 26 '24 11:02 niemisami

@wojtekmaj Do you have the power to merge this fix ?

ooga avatar Feb 27 '24 06:02 ooga

No

wojtekmaj avatar Feb 27 '24 06:02 wojtekmaj

@diegomura please merge this fix

DarpanIO avatar Feb 28 '24 11:02 DarpanIO

While this fixes the issue, I do not think the fix is correct. This has been broken by https://github.com/diegomura/react-pdf/pull/2591 because this.bits is no longer defined.

renchap avatar Mar 02 '24 17:03 renchap

@renchap

I do not think the fix is correct

It's not really helpful without context. Can you please explain why ?

ooga avatar Mar 04 '24 08:03 ooga

@renchap

I do not think the fix is correct

It's not really helpful without context. Can you please explain why ?

Not all JPEG files are 8 bits.

The issue comes from this change:

https://github.com/diegomura/react-pdf/pull/2591/files#diff-47b9860f6533ce7e5b0ededca5d729207841646fc81021d1366fd2de670f1e5dL41

The previous code was setting this.bits from the image metadata, and the new code (using jay-peg) no longer initialises this.bits. The correct change is to extract the bit information with jay-peg and set it.

renchap avatar Mar 04 '24 10:03 renchap

Are you sure of that ?

I didn't find any data to be sure. I found the BitsPerSample tag from the SOF table, but wasn't sure if it's related I'm not a JPEG expert 😄

ooga avatar Mar 04 '24 10:03 ooga

Does this fix the issue where some jpgs just don't show up at all, including chrome etc? ( I used sharp to convert all images to png buffers, which only fixed the images not appearing on mac etc, but not the images that didn't appear at all (These images work everywhere else).

Expected - building has images Screenshot 2024-03-05 at 10 32 19 am

Expected - building has no images Screenshot 2024-03-05 at 10 32 40 am

Not expected - building has images, it takes up the space but nothing appears Screenshot 2024-03-05 at 10 32 29 am

joelybahh avatar Mar 04 '24 23:03 joelybahh

For context, when we leave it jpeg, it throws some silent errors like IPFHandler is not defined and Unknown Version 17.

I fixed these from logging by storing the images as compressed, png buffers, but now it just shows nothing and logs nothing. Once i run the jpeg through squoosh, it loads, even if the filesize output is larger

joelybahh avatar Mar 04 '24 23:03 joelybahh

Since yarn resolutions is not working to install exact version of @react-pdf/pdfkit and @react-pdf/renderer as a dependency of @react-pdf/renderer. we can try to manually add it in dependencies in package.json then run yarn install

also specify the exact version of @react-pdf/renderer

{ "dependencies": { "@react-pdf/image": "2.3.1", "@react-pdf/pdfkit": "3.1.2", "@react-pdf/renderer": "3.1.5" } }

NOTE: this is just a Temporary fix, until @diegomura fix the this.bits issue related to jay-jpeg adjustments.

I test it with JPEG images and it's working properly

image

UPDATE AS OF March 17, 2023 library download link: https://github.com/jason-migz/react-pdf-renderer-3-1-5-without-jay-peg/blob/main/react-pdf-renderer-3.1.5.zip stackoverflow link: https://stackoverflow.com/questions/78072760/react-pdf-downloading-corrupted-pdfs/78171767#78171767 image

jason-migz avatar Mar 16 '24 12:03 jason-migz

I confirm that for the example image the proposed fix works.

I believe a slightly better fix would be

diff --git a/packages/pdfkit/src/image/jpeg.js b/packages/pdfkit/src/image/jpeg.js
index 742dbcb..6f3045f 100644
--- a/packages/pdfkit/src/image/jpeg.js
+++ b/packages/pdfkit/src/image/jpeg.js
@@ -26,6 +26,7 @@ class JPEG {
       }

       if (marker.name === 'SOF') {
+        this.bits ||= marker.precision;
         this.width ||= marker.width;
         this.height ||= marker.height;
         this.colorSpace ||= COLOR_SPACE_MAP[marker.numberOfComponents];

as this would in theory support 12-bits JPEG images, but I believe that most viewers cannot handle them.

In my environment, it is quite easy to test by running yarn dev, so I did a few experiments:

  • the current code shows the first page as a blank page 😢
  • both the fix from f1ce343c2274e62f59b86148c5dfbb76f25b9d6b and my patch show the quijote1.jpeg image and the expected text content on the first page 🥳
  • I tried using a 12-bit JPEG image for testing (I used https://github.com/libjpeg-turbo/libjpeg-turbo/blob/main/testimages/testorig12.jpg ) but none of the viewers I tried was able to display it 💔

ranma42 avatar Mar 18 '24 22:03 ranma42

I confirm that for the example image the proposed fix works.

I believe a slightly better fix would be

diff --git a/packages/pdfkit/src/image/jpeg.js b/packages/pdfkit/src/image/jpeg.js
index 742dbcb..6f3045f 100644
--- a/packages/pdfkit/src/image/jpeg.js
+++ b/packages/pdfkit/src/image/jpeg.js
@@ -26,6 +26,7 @@ class JPEG {
       }

       if (marker.name === 'SOF') {
+        this.bits ||= marker.precision;
         this.width ||= marker.width;
         this.height ||= marker.height;
         this.colorSpace ||= COLOR_SPACE_MAP[marker.numberOfComponents];

as this would in theory support 12-bits JPEG images, but I believe that most viewers cannot handle them.

In my environment, it is quite easy to test by running yarn dev, so I did a few experiments:

  • the current code shows the first page as a blank page 😢
  • both the fix from f1ce343 and my patch show the quijote1.jpeg image and the expected text content on the first page 🥳
  • I tried using a 12-bit JPEG image for testing (I used https://github.com/libjpeg-turbo/libjpeg-turbo/blob/main/testimages/testorig12.jpg ) but none of the viewers I tried was able to display it 💔

@ranma42 in google chrome PDF viewer will always show JPEG images, even the this.bit is undefined :)

jason-migz avatar Mar 19 '24 04:03 jason-migz

@ranma42 in google chrome PDF viewer will always show JPEG images, even the this.bit is undefined :)

I know, but my point is making sure that the resulting PDF is valid/working, not getting to see its content even though it is somewhat malformed ;) In a way Chrome makes it harder to debug by just papering over the issue.

I am currently working on a windows box, so I tested:

  • Chrome & Firefox: they can display the malformed PDF
  • Edge: it displays the first page as blank and the remainder of the document correctly
  • Adobe Reader: it loudly complains about an error while processing a page (twice); after closing the modal alerts, it shows a blank first page and the remainder of the document correctly

ranma42 avatar Mar 19 '24 07:03 ranma42

Any reason this isn't merged yet. JPEGs are currently corrupting pdfs if added.

Kayrim avatar Mar 24 '24 16:03 Kayrim

+1, surely it can't hurt rolling this even if a more correct fix exists right? This goes from broken to less (or not) broken, even if a more correct implementation exists.

joelybahh avatar Mar 24 '24 22:03 joelybahh

+1, surely it can't hurt rolling this even if a more correct fix exists right? This goes from broken to less (or not) broken, even if a more correct implementation exists.

This is unclear to me. As said above, not all JPEG files are 8 bits for what I know. Can someone explain why hardcoding this is a fix?

diegomura avatar Mar 25 '24 17:03 diegomura

+1, surely it can't hurt rolling this even if a more correct fix exists right? This goes from broken to less (or not) broken, even if a more correct implementation exists.

This is unclear to me. As said above, not all JPEG files are 8 bits for what I know. Can someone explain why hardcoding this is a fix?

Hardcoding bits = 8 is not the proper fix, but the one mentioned in this comment seems to be the proper one: https://github.com/diegomura/react-pdf/pull/2646#issuecomment-2005160264

renchap avatar Mar 25 '24 17:03 renchap

Agree. @ooga mind pushing that fix instead? Otherwise I can do in a separate PR

diegomura avatar Mar 25 '24 19:03 diegomura

Should be fixed by https://github.com/diegomura/react-pdf/pull/2689. Thanks for the discussion here!

diegomura avatar Mar 26 '24 09:03 diegomura