amazon-textract-textractor icon indicating copy to clipboard operation
amazon-textract-textractor copied to clipboard

Enhancement: Allow json parser to also set the images by passing the original document

Open ThomasDelteil opened this issue 2 years ago • 0 comments

Current work around for pdf is the following:

from pdf2image import convert_from_path
from textractor.entities.document import Document

# Loading the JSON response 
document = Document.open("output.json")

# Loading the images and setting them on each page
images = convert_from_path('doc.pdf')
for page, image in zip(document.pages, images):
    page.image = image

ThomasDelteil avatar Aug 09 '23 17:08 ThomasDelteil