OpenPDF NullPointerException using PdfTextExtractor

Hello, I'm running into an NPE when using PdfTextExtractor with a file produced by a third party. The code has worked for while but it seems that the third party has updated something and I'm now getting the NPE.

java.lang.NullPointerException: Cannot invoke "com.lowagie.text.pdf.PdfDictionary.getAsDict(com.lowagie.text.pdf.PdfName)" because "resources" is null
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler$SetTextFont.invoke(PdfContentStreamHandler.java:599)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler.lambda$invokeOperator$0(PdfContentStreamHandler.java:204)
	at java.base/java.util.Optional.ifPresent(Optional.java:178)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler.invokeOperator(PdfContentStreamHandler.java:204)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler$Do.processContent(PdfContentStreamHandler.java:989)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler$Do.invoke(PdfContentStreamHandler.java:976)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler.lambda$invokeOperator$0(PdfContentStreamHandler.java:204)
	at java.base/java.util.Optional.ifPresent(Optional.java:178)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler.invokeOperator(PdfContentStreamHandler.java:204)
	at com.lowagie.text.pdf.parser.PdfTextExtractor.processContent(PdfTextExtractor.java:218)
	at com.lowagie.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:199)
	at com.lowagie.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:177)

I'm getting the error with version 2.0.2

The problems seems to be on that line because resource2 is null: https://github.com/LibrePDF/OpenPDF/blob/00afd24a1e44520dc929187cf3840381f5ea8160/openpdf/src/main/java/com/lowagie/text/pdf/parser/PdfContentStreamHandler.java#L968

The error seems similar to #650

Please let me know if you need further information to help troubleshooting this, thanks in advance!

Aug 19 '24 08:08 gtoison

Hello, can you please share a PDF file where this problem occurs? This will make it easier to make a fix.

The issue you are encountering is related to the resources dictionary sometimes being null. This typically happens if the page does not contain a resources dictionary directly. However, the resources dictionary might be inherited from the parent pages (for example, from a "Pages" dictionary).

Pull requests welcome!

Aug 19 '24 20:08 andreasrosdal

Thank you for the answer, the document contains confidential information so I can't unfortunately share it here. I tried making a fix with your suggestion to look for a "Pages" dictionary but ran into the problem that Eclipse won't open it because a maven module "openpdf" has the same name as the project "OpenPDF". I don't have a good connectivity where I am now, I'll try with Intellij

Aug 21 '24 10:08 gtoison

It does not seem to crash with that change: https://github.com/LibrePDF/OpenPDF/commit/6b515217fd8884ece3e1c2b975730629662265d6 That might be a misguided fix because I don't quite know what the code is supposed to do :)

Aug 21 '24 13:08 gtoison

OpenPDF OpenPDF copied to clipboard

NullPointerException using PdfTextExtractor

OpenPDF
OpenPDF copied to clipboard