OpenPDF icon indicating copy to clipboard operation
OpenPDF copied to clipboard

High memory usage when reading a specific PDF

Open arnestockmans-itp opened this issue 11 months ago • 3 comments

Describe the bug

When trying to convert PDF to text, I notice very high memory usage for one specific PDF. This might mean there's a memory leak in the library.

To Reproduce

Code to reproduce the issue:

    val reader = PdfReader(contents)
    val extractor = PdfTextExtractor(reader)

    for (i in 1..reader.numberOfPages) {
        val text = extractor.getTextFromPage(i)
        onPageParsed(text)
    }

When reaching page 4 of the attached PDF, I see this uses ~17GB of memory. With other PDFs, this is way lower.

Expected behavior

Not using this much memory

System

  • OS: macOS, Google Cloud Run
  • OpenPDF version: 2.0.3

Your real name

Arne Stockmans

Additional context

0a715c43-5b76-4bc7-9d91-38ebae902f97_paper.pdf

arnestockmans-itp avatar Feb 19 '25 12:02 arnestockmans-itp

This might mean there's a memory leak in the library.

It might mean that the memory could be used more efficiently but not a memory leak. Or did you see increasing memory which wasn't freed afterwards?

Lonzak avatar Feb 19 '25 13:02 Lonzak

This might mean there's a memory leak in the library.

It might mean that the memory could be use more efficiently but not a memory leak. Or did you see increasing memory which wasn't freed afterwards?

Indeed, you're right, I didn't word it correctly there. The memory is freed afterwards

arnestockmans-itp avatar Feb 19 '25 13:02 arnestockmans-itp