pdfbox icon indicating copy to clipboard operation
pdfbox copied to clipboard

NPE caused by undefined object

Open Trisia opened this issue 1 year ago • 8 comments

example:

125 0 obj
<</Tabs/S/Group<</S/Transparency/Type/Group/CS/DeviceRGB>>/Contents[69 0 R 3646 0 R 70 0 R]/Type/Page/QITE_pageid<</UF 3619 0 R/P 5/D(AA\r)/F 3620 0 R/I 3621 0 R>>/Resources<</ExtGState<</Xi10 1 0 R/GS7 3640 0 R/GS8 3641 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI]/Font<</F7 3633 0 R/F8 3624 0 R/F9 3642 0 R/F1 3627 0 R/F2 3628 0 R/F3 3629 0 R/Xi11 2 0 R>>>>/Parent 82 0 R/StructParents 5/MediaBox[0 0 595.2 841.92]>>

see

QITE_pageid<</UF 3619 0 R/P 5/D(AA\r)/F 3620 0 R/I 3621 0 R>>

object 3621 0 R is not defined

When I use PDDocument.saveIncremental to save the document it causes the following error:

java.lang.NullPointerException
	at java.util.Hashtable.computeIfAbsent(Hashtable.java:1004)
	at org.apache.pdfbox.pdfwriter.COSWriter.getObjectKey(COSWriter.java:1089)
	at org.apache.pdfbox.pdfwriter.COSWriter.writeReference(COSWriter.java:1367)
	at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDictionary(COSWriter.java:1207)
	at org.apache.pdfbox.pdfwriter.COSWriter.writeDictionary(COSWriter.java:1155)
	at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDictionary(COSWriter.java:1202)
	at org.apache.pdfbox.cos.COSDictionary.accept(COSDictionary.java:1265)
	at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:610)
	at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:643)
	at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:540)
	at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:450)
	at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1299)
	at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:413)
	at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1568)
	at org.apache.pdfbox.pdmodel.PDDocument.saveIncremental(PDDocument.java:1078)

so we can skip those not define object, make it work.

Trisia avatar Feb 01 '24 09:02 Trisia

Does this problem also occur with 2.0? Does it also occur in ordinary saving? Can you share the file?

THausherr avatar Feb 01 '24 12:02 THausherr

The issue occurred with PDFBox 3.0.0, and I haven't tested it with 2.X.X. In fact, the PDF is encrypted and doesn't allow editing. I'm sorry, but for certain reasons, I can't provide the PDF file.

Trisia avatar Feb 02 '24 01:02 Trisia

I had a look at 2.0, one of the changes isn't needed (2.0 doesn't use computeIfAbsent because it doesn't exist in the jdk, and also avoids using a null, so maybe this was introduced in refactoring), the other ones are. Your changes look useful so I'll commit them next week to give time for other opinions (COSWriter is a difficult class). I'll also add some logging.

THausherr avatar Feb 02 '24 14:02 THausherr

We have to ensure that the resulting pdf isn't (more) corrupt that the origin one if indirect object references are omitted. To remove a reference from a COSArray shouldn't be a big problem (the object reference is simply missing) but to remove a reference from a COSDictionary without removing the key will lead to a corrupt pdf In such cases the key should be removed as well or the corrupt reference should be replaced by a COSNull object

lehmi avatar Feb 03 '24 12:02 lehmi

In that case I'd really like to test this with a file. I forgot to mention yesterday that I tried to modify a PDF by "blanking" an object and then load and call saveIncremental() but no problem occured.

THausherr avatar Feb 03 '24 13:02 THausherr

I thought the issue was caused by document encryption, but after a simple test, I discovered it was not the case. The document seems to have been generated using iText 5.5.8.

0003719771 00000 n 
0003759924 00000 n 
trailer
<</Info 5 0 R/Encrypt 7894 0 R/ID [<e3d8be7479575ba4b5622d7b671d2255><64acf33dec0b965797b470cbf8346962>]/Root 75 0 R/Size 7895>>
%iText-5.5.8
startxref
3760055
%%EOF

perhaps the original document had already lost the obj references.

maybe remove lost obj reference from a COSArray is good idea.

Trisia avatar Feb 04 '24 01:02 Trisia

https://issues.apache.org/jira/browse/PDFBOX-5717 is the same problem

THausherr avatar Feb 10 '24 16:02 THausherr

Please test whether your code works with the recent changes by @lehmi with the latest snapshot build: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/

THausherr avatar Feb 12 '24 10:02 THausherr