amazon-textract-textractor start_document_analysis does not support List of Images

start_document_analysis does not support List of Images

Open tarunn2799 opened this issue 2 years ago • 2 comments

start_document_analysis in the documentation says it supports a list of PIL images, but in the source code https://github.com/aws-samples/amazon-textract-textractor/blob/e40f5b0378f9ee24d0a757de414505fb06a4471f/textractor/textractor.py#L488

it only accepts a string, a bytearray, or a PIL Image. How do I pass multiple images to this API?

Mar 01 '23 19:03 tarunn2799

You are right this seems to be a left-over from a previous implementaiton. The best way to pass multiple PIL images would simply be to use a for-loop and the sync API. like this:

documents = [extractor.analyze_document(file_source=image, features=[TextractFeatures.FORMS]) for image in images]

Alternatively you can transform your image into a single pdf file and use the ASYNC start_document_analysis api.

Mar 02 '23 06:03 ThomasDelteil

I opened a PR #190 to update the documentation, but I will keep this issue open as a feature enhancement as supporting List[PIL.Image] as input would improve usability.

Mar 09 '23 15:03 Belval

amazon-textract-textractor amazon-textract-textractor copied to clipboard

start_document_analysis does not support List of Images

amazon-textract-textractor
amazon-textract-textractor copied to clipboard