react-pdf
react-pdf copied to clipboard
Fix invalid JPEG image
Fix #2625
Before :
$ pdfcpu validate ./Downloads/document.pdf
validating(mode=relaxed) ./Downloads/document.pdf ...
dereferenceAndLoad: problem dereferencing object 12: strconv.ParseFloat: parsing "undefined": invalid syntax
After :
$ pdfcpu validate ./Downloads/document.pdf
validating(mode=relaxed) ./Downloads/document.pdf ...
validation ok
⚠️ No Changeset found
Latest commit: f1ce343c2274e62f59b86148c5dfbb76f25b9d6b
Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.
This PR includes no changesets
When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types
Click here to learn what changesets are, and how to add one.
Click here if you're a maintainer who wants to add a changeset to this PR
thanks dude!
Is there any reason why this is still unmerged? 👀
@wojtekmaj Do you have the power to merge this fix ?
No
@diegomura please merge this fix
While this fixes the issue, I do not think the fix is correct. This has been broken by https://github.com/diegomura/react-pdf/pull/2591 because this.bits is no longer defined.
@renchap
I do not think the fix is correct
It's not really helpful without context. Can you please explain why ?
@renchap
I do not think the fix is correct
It's not really helpful without context. Can you please explain why ?
Not all JPEG files are 8 bits.
The issue comes from this change:
https://github.com/diegomura/react-pdf/pull/2591/files#diff-47b9860f6533ce7e5b0ededca5d729207841646fc81021d1366fd2de670f1e5dL41
The previous code was setting this.bits from the image metadata, and the new code (using jay-peg) no longer initialises this.bits. The correct change is to extract the bit information with jay-peg and set it.
Are you sure of that ?
I didn't find any data to be sure.
I found the BitsPerSample tag from the SOF table, but wasn't sure if it's related
I'm not a JPEG expert 😄
Does this fix the issue where some jpgs just don't show up at all, including chrome etc? ( I used sharp to convert all images to png buffers, which only fixed the images not appearing on mac etc, but not the images that didn't appear at all (These images work everywhere else).
Expected - building has images
Expected - building has no images
Not expected - building has images, it takes up the space but nothing appears
For context, when we leave it jpeg, it throws some silent errors like IPFHandler is not defined and Unknown Version 17.
I fixed these from logging by storing the images as compressed, png buffers, but now it just shows nothing and logs nothing. Once i run the jpeg through squoosh, it loads, even if the filesize output is larger
Since yarn resolutions is not working to install exact version of @react-pdf/pdfkit and @react-pdf/renderer as a dependency of @react-pdf/renderer. we can try to manually add it in dependencies in package.json then run yarn install
also specify the exact version of @react-pdf/renderer
{ "dependencies": { "@react-pdf/image": "2.3.1", "@react-pdf/pdfkit": "3.1.2", "@react-pdf/renderer": "3.1.5" } }
NOTE: this is just a Temporary fix, until @diegomura fix the this.bits issue related to jay-jpeg adjustments.
I test it with JPEG images and it's working properly
UPDATE AS OF March 17, 2023
library download link: https://github.com/jason-migz/react-pdf-renderer-3-1-5-without-jay-peg/blob/main/react-pdf-renderer-3.1.5.zip
stackoverflow link: https://stackoverflow.com/questions/78072760/react-pdf-downloading-corrupted-pdfs/78171767#78171767
I confirm that for the example image the proposed fix works.
I believe a slightly better fix would be
diff --git a/packages/pdfkit/src/image/jpeg.js b/packages/pdfkit/src/image/jpeg.js
index 742dbcb..6f3045f 100644
--- a/packages/pdfkit/src/image/jpeg.js
+++ b/packages/pdfkit/src/image/jpeg.js
@@ -26,6 +26,7 @@ class JPEG {
}
if (marker.name === 'SOF') {
+ this.bits ||= marker.precision;
this.width ||= marker.width;
this.height ||= marker.height;
this.colorSpace ||= COLOR_SPACE_MAP[marker.numberOfComponents];
as this would in theory support 12-bits JPEG images, but I believe that most viewers cannot handle them.
In my environment, it is quite easy to test by running yarn dev, so I did a few experiments:
- the current code shows the first page as a blank page 😢
- both the fix from f1ce343c2274e62f59b86148c5dfbb76f25b9d6b and my patch show the quijote1.jpeg image and the expected text content on the first page 🥳
- I tried using a 12-bit JPEG image for testing (I used https://github.com/libjpeg-turbo/libjpeg-turbo/blob/main/testimages/testorig12.jpg ) but none of the viewers I tried was able to display it 💔
I confirm that for the example image the proposed fix works.
I believe a slightly better fix would be
diff --git a/packages/pdfkit/src/image/jpeg.js b/packages/pdfkit/src/image/jpeg.js index 742dbcb..6f3045f 100644 --- a/packages/pdfkit/src/image/jpeg.js +++ b/packages/pdfkit/src/image/jpeg.js @@ -26,6 +26,7 @@ class JPEG { } if (marker.name === 'SOF') { + this.bits ||= marker.precision; this.width ||= marker.width; this.height ||= marker.height; this.colorSpace ||= COLOR_SPACE_MAP[marker.numberOfComponents];as this would in theory support 12-bits JPEG images, but I believe that most viewers cannot handle them.
In my environment, it is quite easy to test by running
yarn dev, so I did a few experiments:
- the current code shows the first page as a blank page 😢
- both the fix from f1ce343 and my patch show the quijote1.jpeg image and the expected text content on the first page 🥳
- I tried using a 12-bit JPEG image for testing (I used https://github.com/libjpeg-turbo/libjpeg-turbo/blob/main/testimages/testorig12.jpg ) but none of the viewers I tried was able to display it 💔
@ranma42 in google chrome PDF viewer will always show JPEG images, even the this.bit is undefined :)
@ranma42 in google chrome PDF viewer will always show JPEG images, even the
this.bitisundefined:)
I know, but my point is making sure that the resulting PDF is valid/working, not getting to see its content even though it is somewhat malformed ;) In a way Chrome makes it harder to debug by just papering over the issue.
I am currently working on a windows box, so I tested:
- Chrome & Firefox: they can display the malformed PDF
- Edge: it displays the first page as blank and the remainder of the document correctly
- Adobe Reader: it loudly complains about an error while processing a page (twice); after closing the modal alerts, it shows a blank first page and the remainder of the document correctly
Any reason this isn't merged yet. JPEGs are currently corrupting pdfs if added.
+1, surely it can't hurt rolling this even if a more correct fix exists right? This goes from broken to less (or not) broken, even if a more correct implementation exists.
+1, surely it can't hurt rolling this even if a more correct fix exists right? This goes from broken to less (or not) broken, even if a more correct implementation exists.
This is unclear to me. As said above, not all JPEG files are 8 bits for what I know. Can someone explain why hardcoding this is a fix?
+1, surely it can't hurt rolling this even if a more correct fix exists right? This goes from broken to less (or not) broken, even if a more correct implementation exists.
This is unclear to me. As said above, not all JPEG files are 8 bits for what I know. Can someone explain why hardcoding this is a fix?
Hardcoding bits = 8 is not the proper fix, but the one mentioned in this comment seems to be the proper one: https://github.com/diegomura/react-pdf/pull/2646#issuecomment-2005160264
Agree. @ooga mind pushing that fix instead? Otherwise I can do in a separate PR
Should be fixed by https://github.com/diegomura/react-pdf/pull/2689. Thanks for the discussion here!