Is there a way to get the page image as a stream
I am trying to extract all images from a pdf . page.images currectly returns the list of images but the only way to use these images further is to call image_obj.save(path) . I don't want to save these intermediate images , rather I would prefer them as file stream. Is there a way to do so?
Unfortunately, I don't think I clearly understand this question. Could you add more detail, and a example of what you're trying to do?
Sorry, I will try to explain it better. I have a pdf which has some text and images. Page.images gives me list of images (some metadata which contains it's bounding box information ). Now if I need to extract these images, I need to do get the bounding box info from the metadata , crop it from Page object and then use to_image() .
Instead of doing these , is there a way I can get the images in a page from a bytes stream or something?
Ah, I see. Thank you for the explanation! PDFPlumber does not directly provide a way to do this, but it might be to add this feature in the future, via pdfminer.six's .export_image(...) method.
can I input my OCR texts and bounding box?not use pdfminer.six package. Then use the extract_table function