pdfboxing
pdfboxing copied to clipboard
Export to image support
First of all - kudos for this library! It proves to be very useful to our project in Magnet. However we need an export to image functionality that Apache's PDFbox provides. We fought that it would be nice if your library has it as well.
We'd be happy to make a PR with this.
First of all - kudos for this library! It proves to be very useful to our project in Magnet.
Thank you, I'm glad that you're finding it useful.
However we need an export to image functionality that Apache's PDFbox provides.
OK. Is this functionality already present in any of the Java examples here: https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/
I'm asking because I'm trying to understand what exactly are you trying to do: extract images out of a PDF or ...?
We fought that it would be nice if your library has it as well. We'd be happy to make a PR with this.
OK, but let me understand what you're trying to do first. Then if you're willing to do the work, then that would be great.
We have pdfs (possibly multi-paged) that we need thumbnails for. In our case, each page gets converted into an image. Something like with Google Drive - they don't display a pdf in the preview. Just an image with its thumbnail.
We have a use case where we want to extract all images from the entire document so we can then do ML on each image. Extracting the text is done separately. PDFBox looks like the right tool for it:
https://docs.aspose.com/pdf/java/extract-images-from-pdf-file/
Similar use case with the nodeJS pdf-lib
(the extract-images.zip
example which seems to work well):
https://github.com/Hopding/pdf-lib/issues/83#issuecomment-487383843