amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Textractor doesn't detect the INVOICE_RECEIPT_ID, but the AWS Textract Demo can
Hello, I'm having an issue with the amazon-textract-textractor library. It doesn't detect theINVOICE_RECEIPT_ID, but the AWS Textract Demo can detect it.
Here is the AWS Textract Demo:
amazon-textract-textractor:
Here is the sample code:from textractor import Textractor
extractor = Textractor(profile_name="")
document = extractor.analyze_expense( file_source="test.jpg", save_image=False, ) expense_doc = document.expense_documents[0] summary_fields = expense_doc.summary_fields line_field = expense_doc.line_items_groups print(summary_fields)
Sample Receipt:
I am not sure what the backend implementation is on the textract demo but I have personally noticed that textract async calls produce superior results than the sync equivalent.
Given that your input is just an image / one paged doc. It can be very tempting to call the sync api extractor.analyze_expense() because its quicker and has less overhead. Try using the async extractor.start_expense_analysis() instead and compare your results.