pdfboxing icon indicating copy to clipboard operation
pdfboxing copied to clipboard

Export to image support

Open werenall opened this issue 4 years ago • 3 comments

First of all - kudos for this library! It proves to be very useful to our project in Magnet. However we need an export to image functionality that Apache's PDFbox provides. We fought that it would be nice if your library has it as well.

We'd be happy to make a PR with this.

werenall avatar Apr 02 '20 12:04 werenall

First of all - kudos for this library! It proves to be very useful to our project in Magnet.

Thank you, I'm glad that you're finding it useful.

However we need an export to image functionality that Apache's PDFbox provides.

OK. Is this functionality already present in any of the Java examples here: https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/

I'm asking because I'm trying to understand what exactly are you trying to do: extract images out of a PDF or ...?

We fought that it would be nice if your library has it as well. We'd be happy to make a PR with this.

OK, but let me understand what you're trying to do first. Then if you're willing to do the work, then that would be great.

dotemacs avatar Apr 02 '20 12:04 dotemacs

We have pdfs (possibly multi-paged) that we need thumbnails for. In our case, each page gets converted into an image. Something like with Google Drive - they don't display a pdf in the preview. Just an image with its thumbnail. image

werenall avatar Apr 02 '20 12:04 werenall

We have a use case where we want to extract all images from the entire document so we can then do ML on each image. Extracting the text is done separately. PDFBox looks like the right tool for it:

https://docs.aspose.com/pdf/java/extract-images-from-pdf-file/

Similar use case with the nodeJS pdf-lib (the extract-images.zip example which seems to work well): https://github.com/Hopding/pdf-lib/issues/83#issuecomment-487383843

avocade avatar Mar 11 '24 16:03 avocade