amazon-textract-textractor icon indicating copy to clipboard operation
amazon-textract-textractor copied to clipboard

Textractor doesn't detect the INVOICE_RECEIPT_ID, but the AWS Textract Demo can

Open arsher-b opened this issue 1 year ago • 1 comments

Hello, I'm having an issue with the amazon-textract-textractor library. It doesn't detect theINVOICE_RECEIPT_ID, but the AWS Textract Demo can detect it.

Here is the AWS Textract Demo: Screenshot 2024-11-20 at 11 06 06 AM

amazon-textract-textractor: Screenshot 2024-11-20 at 11 06 49 AM

Here is the sample code:from textractor import Textractor

extractor = Textractor(profile_name="")

document = extractor.analyze_expense( file_source="test.jpg", save_image=False, ) expense_doc = document.expense_documents[0] summary_fields = expense_doc.summary_fields line_field = expense_doc.line_items_groups print(summary_fields)

Sample Receipt: test

arsher-b avatar Nov 20 '24 03:11 arsher-b

I am not sure what the backend implementation is on the textract demo but I have personally noticed that textract async calls produce superior results than the sync equivalent.

Given that your input is just an image / one paged doc. It can be very tempting to call the sync api extractor.analyze_expense() because its quicker and has less overhead. Try using the async extractor.start_expense_analysis() instead and compare your results.

Chuukwudi avatar Dec 02 '24 23:12 Chuukwudi