amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Enhancement: Allow json parser to also set the images by passing the original document
Current work around for pdf is the following:
from pdf2image import convert_from_path
from textractor.entities.document import Document
# Loading the JSON response
document = Document.open("output.json")
# Loading the images and setting them on each page
images = convert_from_path('doc.pdf')
for page, image in zip(document.pages, images):
page.image = image