i7j-rups icon indicating copy to clipboard operation
i7j-rups copied to clipboard

RUPS hangs "Reading the Cross-Reference table 3%" of FoxHexOne Mutation file1114.pdf

Open petervwyatt opened this issue 3 years ago • 4 comments

iText RUPS 7.2.4 hangs while attempting to open FoxHexOne Mutation file1114.pdf (see https://github.com/pdf-association/pdf-corpora#foxhex0ne-mutations). There is no output in the console, Debug Info or any other pane. Yes, this is somehow a bad file, but I was hoping to find out why.

image

petervwyatt avatar Dec 08 '22 05:12 petervwyatt

Do you happen to still have these PDF files lying around somewhere? The site is now dead and Internet Archive didn't store the PDFs themselves...

Eswcvlad avatar Apr 03 '25 22:04 Eswcvlad

Sorry to say it still causes an exception with the new 25.03 release...

file1114.pdf

I get a crash dialog (using EXE) and this:

ERROR - Cannot invoke "com.itextpdf.rups.model.TreeNodeFactory.expandNode(com.itextpdf.rups.view.itext.treenodes.PdfObjectTreeNode)" because the return value of "com.itextpdf.rups.model.ObjectLoader.getNodes()" is null

petervwyatt avatar Apr 04 '25 01:04 petervwyatt

Hmm. So before it was just hanging without errors.

Now it throws an error, but there is an issue in ObjectLoader, that the exception in the loading itself gets ignored. But since init fails, null is propagated further and gives us the current error.

The actual error is coming from the tokenizer in iText:

com.itextpdf.io.exceptions.IOException: Error at file pointer 101,117.
	at com.itextpdf.io.source.PdfTokenizer.throwError(PdfTokenizer.java:734)
	at com.itextpdf.kernel.pdf.PdfReader.readDictionary(PdfReader.java:979)
	at com.itextpdf.kernel.pdf.PdfReader.readObject(PdfReader.java:896)
	at com.itextpdf.kernel.pdf.PdfReader.readObject(PdfReader.java:847)
	at com.itextpdf.kernel.pdf.PdfReader.readObject(PdfReader.java:1493)
	at com.itextpdf.kernel.pdf.PdfReader.readObject(PdfReader.java:1497)
	at com.itextpdf.kernel.pdf.PdfReader.readObject(PdfReader.java:843)
	at com.itextpdf.kernel.pdf.PdfIndirectReference.getRefersTo(PdfIndirectReference.java:109)
	at com.itextpdf.kernel.pdf.PdfIndirectReference.getRefersTo(PdfIndirectReference.java:113)
	at com.itextpdf.kernel.pdf.PdfIndirectReference.getRefersTo(PdfIndirectReference.java:93)
	at com.itextpdf.kernel.pdf.PdfDocument.getPdfObject(PdfDocument.java:452)
	at com.itextpdf.rups.model.IndirectObjectFactory.storeNextObject(IndirectObjectFactory.java:139)
	at com.itextpdf.rups.model.ObjectLoader.doInBackground(ObjectLoader.java:140)
	at com.itextpdf.rups.model.ObjectLoader.doInBackground(ObjectLoader.java:56)
	at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.itextpdf.io.exceptions.IOException: Dictionary key R is not a name.
	... 20 more

So it trips over this part: Image

As in it parses the dictionary key-value pair as /Parent#01#FBC#E3 0, which seems correct in this case, and then encounters R, which is not a name, so it bails.

Overall, it looks like correct behavior from iText, though a bit disappointing, that it prevents looking at other stuff in the document in RUPS.

But we definitely need to fix the error message in RUPS.

Eswcvlad avatar Apr 04 '25 11:04 Eswcvlad

I took a quick look at this and have created a PR [(https://github.com/itext/rups/pull/193)] @Eswcvlad Vlad, I saw you also have a PR up for this., so I hope I'm not stepping on your toes .

It looks like the code goes through the same error path twice. The first time it tries to fix the xref , and on the 2nd pass through again encounters what looks like the same error, so it gives up and throws the exception. Perhaps a bigger, better fix would be to see why fixing the xref fails? Anyway, my PR just adds a little detail to the information shown and ensures we communicate the exception to the user instead of continuing in the storeNextObject loop and finally throwing an NPE,. We also do manage to load some information about the file, although of course we shouldn't trust it all to be 'correct' as we know the PDF mentioned on this issue is structurally unsound (polite way of saying it's bad!).

Image Image

YoungJules avatar Jul 23 '25 06:07 YoungJules