amazon-textract-textractor
amazon-textract-textractor copied to clipboard
start_document_analysis does not support List of Images
start_document_analysis in the documentation says it supports a list of PIL images, but in the source code https://github.com/aws-samples/amazon-textract-textractor/blob/e40f5b0378f9ee24d0a757de414505fb06a4471f/textractor/textractor.py#L488
it only accepts a string, a bytearray, or a PIL Image. How do I pass multiple images to this API?
You are right this seems to be a left-over from a previous implementaiton. The best way to pass multiple PIL images would simply be to use a for-loop and the sync API. like this:
documents = [extractor.analyze_document(file_source=image, features=[TextractFeatures.FORMS]) for image in images]
Alternatively you can transform your image into a single pdf file and use the ASYNC start_document_analysis api.
I opened a PR #190 to update the documentation, but I will keep this issue open as a feature enhancement as supporting List[PIL.Image] as input would improve usability.