stump
stump copied to clipboard
[FEATURE] Support PDF page-to-image configurable options
Is your feature request related to a problem? Please describe.
Right now PDF pages are converted to a bitmap and then converted to a PNG image. Some PDFs render just fine with this default, but others turn out a bit wonky.
Describe the solution you'd like
A user should be able to configure the PDF page-to-image options so they can control:
- Quality options (upscale, downscale)
- Target format (JPEG, PNG, etc)
Additional context
Look at the Alice In Wonderland PDF file on the demo site, the pages when using the image-based reader do not look good. I'm sure, at a minimum, https://github.com/stumpapp/stump/issues/156#issuecomment-1712655256 is also a factor.
Currently PDF pages are rendered low quality by default and there's no knob to adjust that. Solving this issue alone would make a big difference, EPUB already works really well, and if PDFs were readable with the built-in reader, this would cover a lot of users needs.
Currently PDF pages are rendered low quality by default and there's no knob to adjust that
That's what this issue is for 🙂 I haven't had time to prioritize building this out myself, but would happily accept contributions. Otherwise, I'll try to get to it sometime in 2025.
The real issue, as I understand it based on conversations with folks more knowledgeable in PDF processing than myself, is that consistent quality for PDF page-to-image rendering is difficult to achieve. That said, it seems that PdfRenderConfig has some knobs we could probably hook into with config values to hopefully give more control and consistent quality.
The other potential solve would be to use something like PDF.js in the browser and not rely on PDF page processing to output an image.
Have you considered https://github.com/ArtifexSoftware/mupdf.js? Here is a demo: https://casper.mupdf.com/wasm/demo/?file=../../docs/mupdf_explored.pdf
Have you considered https://github.com/ArtifexSoftware/mupdf.js?
I haven't, but if it were to be considered there would have to be Rust bindings readily available since I don't necessarily want to make them myself 😅
A quick search shows there is at least one, but not sure how viable it might be: https://crates.io/crates/mupdf
I've been playing with those bindings for creating a PDF viewer recently -- and as far as I can tell, there is only one issue with them which can cause incorrect behavior but otherwise it's mostly a copy of the Python PyMuPDF API. One neat thing I've read about it is that it can use some tricks to allow reading parts of a PDF file streamed from elsewhere: https://mupdf.readthedocs.io/en/latest/progressive-loading.html
I can look into submitting a PR to switching to MuPDF if you're open to it? I think pdfium is a valid choice too, so I can also look into that. I'm not sure there would be a significant difference in behavior between the two libraries in rendering to an image. I think one issue in addition to this is scaling to the actual DPI of the monitor. The renders are very blurry on HiDPI.
@kirawi Sorry I'm a bit late, I totally missed this:
I can look into submitting a PR to switching to MuPDF if you're open to it?
I'd be open to it if it is still something you're interested in contributing, but I'd want to make sure to thoroughly test it to ensure it is an improvement
Where would one look on integrating mupdf.js?
Where would one look on integrating mupdf.js?
I would suggest looking at some of the examples in the repository for the rust bindings I found: https://github.com/messense/mupdf-rs.
~In particular, this one: https://github.com/messense/mupdf-rs/blob/main/examples/extract_images.rs~ Not sure that actually does what we'd want
I've hacked a working mupdf migration prototype with claude code.
It compiles, and I have yet to find a pdf that renders pixelated like it used. Since I built it from main, I have a weird bug where the interface is very dark with orange highlight, ~and some pages seem to render with a transparent background...~ fixed a alpha channel issue.
Is anything generated with an LLM of any use to you @aaronleopold ?
@hollisticated-horse I would take a look, sure. Just make a PR and I can give it a look most likely some time over the weekend.
Edit to add that it would be easier if whatever you added is based into the breaking/sea-orm branch, but if not no worries