pdfplumber icon indicating copy to clipboard operation
pdfplumber copied to clipboard

Is there a way to get the page image as a stream

Open nandagopal1992 opened this issue 6 years ago • 4 comments

I am trying to extract all images from a pdf . page.images currectly returns the list of images but the only way to use these images further is to call image_obj.save(path) . I don't want to save these intermediate images , rather I would prefer them as file stream. Is there a way to do so?

nandagopal1992 avatar Apr 07 '20 13:04 nandagopal1992

Unfortunately, I don't think I clearly understand this question. Could you add more detail, and a example of what you're trying to do?

jsvine avatar Apr 08 '20 00:04 jsvine

Sorry, I will try to explain it better. I have a pdf which has some text and images. Page.images gives me list of images (some metadata which contains it's bounding box information ). Now if I need to extract these images, I need to do get the bounding box info from the metadata , crop it from Page object and then use to_image() .

Instead of doing these , is there a way I can get the images in a page from a bytes stream or something?

nandagopal1992 avatar Apr 09 '20 06:04 nandagopal1992

Ah, I see. Thank you for the explanation! PDFPlumber does not directly provide a way to do this, but it might be to add this feature in the future, via pdfminer.six's .export_image(...) method.

jsvine avatar Apr 10 '20 02:04 jsvine

can I input my OCR texts and bounding box?not use pdfminer.six package. Then use the extract_table function

a417886 avatar Nov 21 '20 15:11 a417886