pdf-lib
pdf-lib copied to clipboard
Corrupted PDF
Hi.
This is an amazing library. Thanks a lot @Hopding. I know you've been inactive for a while but the quality of the code and the support you gave for this during your active time has been absolutely phenomenal. You don't see such fantastic support even for paid products. Good luck whatever you're up to.
My issue is this: I have this PDF file than looks like this:
data:image/s3,"s3://crabby-images/b484c/b484cc62edd22104a8e20ae4fae01f9c9622cce7" alt="Screen Shot 2021-07-31 at 9 06 17 PM"
But when I open/save it using pdf-lib, it will look like this:
data:image/s3,"s3://crabby-images/1df22/1df22adfd6dd26df987837f1e8e40e8f4d525497" alt="Screen Shot 2021-07-31 at 9 10 36 PM"
Has anyone ever had a similar experience?
I have added a $500 bounty for anyone who can fix this.
Not that I would consider this as fixed (for the bounty) only if this is fixed on PDF-Lib, not by changing the pdf file (eg saving/compressing it using other programs)
@emilsedgh
I believe there is some non-critical error in the pdf file provided since I'm not able to run it through iText RUPS
to investigate the structure
com.itextpdf.kernel.PdfException: Invalid indirect reference {0}.
I'm suspecting that the custom font is not properly embedded.
For example, the following is using ArialBold
.
105 0 obj
<</V (��) /DA (/ArialBold 0 Tf 0 0 0.501961 rg) /DR 114 0 R /F 4 /FT /Tx /Rect [39.5289 469.115 139.84 480.419 ] /Subtype /Widget /T (Lease MLS) /TU (Lease MLS) /Type /Annot /MK 118 0 R /Ff 0 /M (D:20210728200742Z) /AP <</N 19 0 R >> >>
endobj
107 1 obj
<</Length 0 /Subtype /Form /BBox [0 0 99.64 11.479 ] >> stream
endstream
endobj
FYI it is not part of the standardFont
https://pdf-lib.js.org/docs/api/enums/standardfonts
Are you in control of the generation of that particular PDF File? or do you just want to modify it?
I've repaired your PDF file and provided in the following repo.
https://github.com/PhakornKiong/pdfLoadError
HI @PhakornKiong. Good job at investigating. Since other pdf software are able to recover from this situation, I'd love to see a patch that'd make pdf-lib also recover from it. For example other pdf software are able to fallback to other fonts.
Unfortunately I have a series of PDF's that are already generate. My intention is to be able to use them with pdf-lib.
Thanks.
@emilsedgh does this happen if you save the document with pdfDoc.save({ useObjectStreams: false })
?
Yes. The same thing happens although the results look slightly different.
Here is the PDF file for reference so this could be easily reproduced.
Is this the corrupted PDF or the original PDF?
This is the original one.
On Sep 28, 2021, at 5:28 AM, Charles Timko @.***> wrote:
test.pdf
Here is the PDF file for reference so this could be easily reproduced.
Is this the corrupted PDF or the original PDF?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
I also received this cryptic issue. Then I tested with the pdfLoadError tool. The individual lines didn't convince me as error handling. So I split the PDF document into the individual pages (https://www.ilovepdf.com/split_pdf) and, curiously, the split first page is now displayed correctly. So just by splitting the problem is gone. I hope @Hopding ding it helps you.
@dcsline there is an easier solution to your problem that will come with the new release of pdf-lib ( look PR NO #986 ). Hopefully, that means that splitting your pdf is no longer needed before working with pdf-lib 😃
POST /#951/:0/merg e_requests
Merge pull request #1000 from Hopding/POST /#951/:0/merg e_requests
Is the issue solved or not yet.