Java Out of Memory Issue
When performing a scan, I get the following error and no new files are added to the folder.
025-06-12T21:32:37.880Z INFO 1 --- [booklore-api] [ virtual-928] c.a.b.s.l.LibraryProcessingService : Processing file: Detailed Map.pdf
Exception in thread "" java.lang.OutOfMemoryError: Java heap space
at java.desktop/java.awt.image.DataBufferInt.
Thanks for reporting this!
Could you please share a few more details to help us investigate? • Approximate size (MB) and number of pages in the PDF • Your system specs (RAM, OS, Java version, etc.)
That’ll help us reproduce and figure out how best to handle the memory issue.
It’s 27 MB, 1 page. This is a highly detailed PDF of a map of a city.
I’m running this in a docker container on Portainer. The host has 8GB ram. Java version jdk-21.0.7+6
The PDF rendering library tries to generate a cover image from the first page of the file. Since this PDF has just one very detailed page, it’s likely too large for the library to handle in memory during image generation.
I’ll check if there’s a way to optimize this process or at least add safeguards to prevent crashes like this.
Just dropping in to say that I've experienced the same. My server is ridiculously underpowered, so I wasn't incredibly surprised at getting an out-of-memory exception. (2GB of RAM on an proxmox hosted Ubuntu 24.04 box) Most of them were caused by random PDFs I've picked up over the years and was happy to dispose of anyway. There was one comparatively large professionally published epub that I'd like to keep around that triggered it. (484 MB) The java details inside the booklore_server are as follows:
openjdk version "21.0.7" 2025-04-15 LTS
OpenJDK Runtime Environment Temurin-21.0.7+6 (build 21.0.7+6-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.7+6 (build 21.0.7+6-LTS, mixed mode, sharing)
My only suggestion for an interim solution would be to make it more evident from the UI when a scan has failed due to exceptions. I only noticed it had failed when I started investigating missing ebooks and found that the scan task had failed without clearly bubbling up the error on the front end.
@adityachandelgit I think there's more to this issue and there's more to be done.
Reasons for the error had not been investigated I noticed that cover image sometimes fails to generate from a fairly simple PDF due to java out of memory, and I can't say that my 4 GB should not be enough for that task. My hypothesis is that the problem lies in PDF document structure with pdfbox failing to do the job due to some configuration. One particular example you can check is https://www.hispabrickmagazine.com/pdfs/HBM003_EN/HBM003_EN.pdf (by the way, would be great to add support for series with ISSN!).
Inconsistent state when error occurs If the image generation thread fails due to out of memory error, the system is left in inconsistent state. In particular,
- the file is moved from bookdrop to the library filesystem and remains there, although no entry has been created in DB to address it
- the file is deleted from bookdrop, although the record remains in bookdrop database In fact, the whole process of moving/renaming files whould be transactional with database updates, this is very important because failing to do so leads to inconsistent states.
Better workaround needed Since the only issue is pdfbox failing to generate image, why not handle the exception in such a way that a placeholder image is supplied so that the user can replace the placeholder image with whatever they want after the entry in DB has been created?
To add to this, one of the issues with PDFBOX unable to render dates back to 2019 https://issues.apache.org/jira/browse/PDFBOX-4690 and it might be the case it's related. The issue is still not resolved after changes reverted in 2022 :)