PdfBox-Android icon indicating copy to clipboard operation
PdfBox-Android copied to clipboard

For some particular pdfs, it throws IndexOutOfBoundsException

Open alamkanak opened this issue 4 years ago • 1 comments

Describe the bug For some particular pdfs, the PDFTextStripper.getText() throws an exception.

To reproduce Code snippet to reproduce the behavior:

// In onCreate of MainApplication.kt
PDFBoxResourceLoader.init(this)

// In a fragment
val inputStream = context.contentResolver.openInputStream(uri)
val pdDoc = PDDocument.load(inputStream)
val pdfStripper = PDFTextStripper()
val text = pdfStripper.getText(pdDoc)

PDF example HARCOURT Invisible Umpires.pdf

Expected behavior The text should be extracted correctly from the pdf.

Actual behavior By running this code, you get the following exception:

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.get(ArrayList.java:437)
        at com.tom_roush.fontbox.cmap.CMapParser.parseBeginbfrange(CMapParser.java:373)
        at com.tom_roush.fontbox.cmap.CMapParser.parse(CMapParser.java:137)
        at com.tom_roush.pdfbox.pdmodel.font.CMapManager.parseCMap(CmapManager.java:73)
        at com.tom_roush.pdfbox.pdmodel.font.PDFont.readCMap(PDFont.java:175)
        at com.tom_roush.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:121)
        at com.tom_roush.pdfbox.pdmodel.font.PDSimpleFont.<init>(PDSimpleFont.java:86)
        at com.tom_roush.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:74)
        at com.tom_roush.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:58)
        at com.tom_roush.pdfbox.pdmodel.PDResources.getFont(PDResources.java:122)
        at com.tom_roush.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:58)
        at com.tom_roush.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:816)
        at com.tom_roush.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:473)
        at com.tom_roush.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:447)
        at com.tom_roush.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:148)
        at com.tom_roush.pdfbox.text.PDFTextStreamEngine.processPage(PDFTextStreamEngine.java:141)
        at com.tom_roush.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:394)
        at com.tom_roush.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:321)
        at com.tom_roush.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:268)
        at com.tom_roush.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:229)
        at com.cliffweitzman.speechify2.repository.LibraryRepository.pendingRecordToText(LibraryRepository.kt:218)
        at com.cliffweitzman.speechify2.repository.LibraryRepository$pendingRecordToText$1.invokeSuspend(Unknown Source:15)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)

Environment details:

  • PdfBox-Android version: 2.0.2.0
  • Android API version: API 30

alamkanak avatar Aug 20 '21 17:08 alamkanak

Please retry with the current version, that bug has been solved a month ago. https://github.com/TomRoush/PdfBox-Android/blame/master/library/src/main/java/com/tom_roush/fontbox/cmap/CMapParser.java#L365

THausherr avatar Nov 24 '21 11:11 THausherr