stump icon indicating copy to clipboard operation
stump copied to clipboard

[FEATURE] Support PDF page-to-image configurable options

Open aaronleopold opened this issue 2 years ago • 5 comments

Is your feature request related to a problem? Please describe.

Right now PDF pages are converted to a bitmap and then converted to a PNG image. Some PDFs render just fine with this default, but others turn out a bit wonky.

Describe the solution you'd like

A user should be able to configure the PDF page-to-image options so they can control:

  • Quality options (upscale, downscale)
  • Target format (JPEG, PNG, etc)

Additional context

Look at the Alice In Wonderland PDF file on the demo site, the pages when using the image-based reader do not look good. I'm sure, at a minimum, https://github.com/stumpapp/stump/issues/156#issuecomment-1712655256 is also a factor.

aaronleopold avatar Sep 23 '23 15:09 aaronleopold

Currently PDF pages are rendered low quality by default and there's no knob to adjust that. Solving this issue alone would make a big difference, EPUB already works really well, and if PDFs were readable with the built-in reader, this would cover a lot of users needs.

wsdookadr avatar Dec 26 '24 07:12 wsdookadr

Currently PDF pages are rendered low quality by default and there's no knob to adjust that

That's what this issue is for 🙂 I haven't had time to prioritize building this out myself, but would happily accept contributions. Otherwise, I'll try to get to it sometime in 2025.

The real issue, as I understand it based on conversations with folks more knowledgeable in PDF processing than myself, is that consistent quality for PDF page-to-image rendering is difficult to achieve. That said, it seems that PdfRenderConfig has some knobs we could probably hook into with config values to hopefully give more control and consistent quality.

The other potential solve would be to use something like PDF.js in the browser and not rely on PDF page processing to output an image.

aaronleopold avatar Dec 26 '24 17:12 aaronleopold

Have you considered https://github.com/ArtifexSoftware/mupdf.js? Here is a demo: https://casper.mupdf.com/wasm/demo/?file=../../docs/mupdf_explored.pdf

kirawi avatar Feb 05 '25 18:02 kirawi

Have you considered https://github.com/ArtifexSoftware/mupdf.js?

I haven't, but if it were to be considered there would have to be Rust bindings readily available since I don't necessarily want to make them myself 😅

A quick search shows there is at least one, but not sure how viable it might be: https://crates.io/crates/mupdf

aaronleopold avatar Feb 06 '25 02:02 aaronleopold

I've been playing with those bindings for creating a PDF viewer recently -- and as far as I can tell, there is only one issue with them which can cause incorrect behavior but otherwise it's mostly a copy of the Python PyMuPDF API. One neat thing I've read about it is that it can use some tricks to allow reading parts of a PDF file streamed from elsewhere: https://mupdf.readthedocs.io/en/latest/progressive-loading.html

I can look into submitting a PR to switching to MuPDF if you're open to it? I think pdfium is a valid choice too, so I can also look into that. I'm not sure there would be a significant difference in behavior between the two libraries in rendering to an image. I think one issue in addition to this is scaling to the actual DPI of the monitor. The renders are very blurry on HiDPI.

kirawi avatar Feb 23 '25 05:02 kirawi

@kirawi Sorry I'm a bit late, I totally missed this:

I can look into submitting a PR to switching to MuPDF if you're open to it?

I'd be open to it if it is still something you're interested in contributing, but I'd want to make sure to thoroughly test it to ensure it is an improvement

aaronleopold avatar Sep 05 '25 14:09 aaronleopold

Where would one look on integrating mupdf.js?

hollisticated-horse avatar Sep 05 '25 16:09 hollisticated-horse

Where would one look on integrating mupdf.js?

I would suggest looking at some of the examples in the repository for the rust bindings I found: https://github.com/messense/mupdf-rs.

~In particular, this one: https://github.com/messense/mupdf-rs/blob/main/examples/extract_images.rs~ Not sure that actually does what we'd want

aaronleopold avatar Sep 17 '25 15:09 aaronleopold

I've hacked a working mupdf migration prototype with claude code.

It compiles, and I have yet to find a pdf that renders pixelated like it used. Since I built it from main, I have a weird bug where the interface is very dark with orange highlight, ~and some pages seem to render with a transparent background...~ fixed a alpha channel issue.

Is anything generated with an LLM of any use to you @aaronleopold ?

hollisticated-horse avatar Sep 25 '25 16:09 hollisticated-horse

@hollisticated-horse I would take a look, sure. Just make a PR and I can give it a look most likely some time over the weekend.

Edit to add that it would be easier if whatever you added is based into the breaking/sea-orm branch, but if not no worries

aaronleopold avatar Sep 25 '25 17:09 aaronleopold