OpenPDF
OpenPDF copied to clipboard
NullPointerException using PdfTextExtractor
Hello, I'm running into an NPE when using PdfTextExtractor with a file produced by a third party. The code has worked for while but it seems that the third party has updated something and I'm now getting the NPE.
java.lang.NullPointerException: Cannot invoke "com.lowagie.text.pdf.PdfDictionary.getAsDict(com.lowagie.text.pdf.PdfName)" because "resources" is null
at com.lowagie.text.pdf.parser.PdfContentStreamHandler$SetTextFont.invoke(PdfContentStreamHandler.java:599)
at com.lowagie.text.pdf.parser.PdfContentStreamHandler.lambda$invokeOperator$0(PdfContentStreamHandler.java:204)
at java.base/java.util.Optional.ifPresent(Optional.java:178)
at com.lowagie.text.pdf.parser.PdfContentStreamHandler.invokeOperator(PdfContentStreamHandler.java:204)
at com.lowagie.text.pdf.parser.PdfContentStreamHandler$Do.processContent(PdfContentStreamHandler.java:989)
at com.lowagie.text.pdf.parser.PdfContentStreamHandler$Do.invoke(PdfContentStreamHandler.java:976)
at com.lowagie.text.pdf.parser.PdfContentStreamHandler.lambda$invokeOperator$0(PdfContentStreamHandler.java:204)
at java.base/java.util.Optional.ifPresent(Optional.java:178)
at com.lowagie.text.pdf.parser.PdfContentStreamHandler.invokeOperator(PdfContentStreamHandler.java:204)
at com.lowagie.text.pdf.parser.PdfTextExtractor.processContent(PdfTextExtractor.java:218)
at com.lowagie.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:199)
at com.lowagie.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:177)
I'm getting the error with version 2.0.2
The problems seems to be on that line because resource2 is null:
https://github.com/LibrePDF/OpenPDF/blob/00afd24a1e44520dc929187cf3840381f5ea8160/openpdf/src/main/java/com/lowagie/text/pdf/parser/PdfContentStreamHandler.java#L968
The error seems similar to #650
Please let me know if you need further information to help troubleshooting this, thanks in advance!
Hello, can you please share a PDF file where this problem occurs? This will make it easier to make a fix.
The issue you are encountering is related to the resources dictionary sometimes being null. This typically happens if the page does not contain a resources dictionary directly. However, the resources dictionary might be inherited from the parent pages (for example, from a "Pages" dictionary).
Pull requests welcome!
Thank you for the answer, the document contains confidential information so I can't unfortunately share it here. I tried making a fix with your suggestion to look for a "Pages" dictionary but ran into the problem that Eclipse won't open it because a maven module "openpdf" has the same name as the project "OpenPDF". I don't have a good connectivity where I am now, I'll try with Intellij
It does not seem to crash with that change: https://github.com/LibrePDF/OpenPDF/commit/6b515217fd8884ece3e1c2b975730629662265d6 That might be a misguided fix because I don't quite know what the code is supposed to do :)