amazon-textract-textractor icon indicating copy to clipboard operation
amazon-textract-textractor copied to clipboard

Analyze documents with Amazon Textract and generate output in multiple formats.

Results 129 amazon-textract-textractor issues
Sort by recently updated
recently updated
newest added

I am processing some fairly simple pdfs from S3 using textract document detection. For most of these documents, the returned JSON contains very little text. For example, using the pdf...

We're starting an invoice processing project and really like this library, but we're having one interesting issue: The text is all parsed correctly, but then it is not always grouped...

Currency symbols not identifying properly. Pound symbol is recognised as E

This PR introduces a new way to use Textract and process its output in Python. It provides redesigned APIs for Text, Tables, Forms, Expense and AnalyseID to improve developer productivity,...

When obtaining predictions through `analyze_document`, the image is converted to JPEG https://github.com/aws-samples/amazon-textract-textractor/blob/master/textractor/textractor.py#L845. The compression is enough to degrade the table predictions. We should check and keep the format, assuming that...

bug

If you extract both LAYOUT and TABLEs, the tables for some reason are printed at the end of the output, rather than linearized correctly. Related issue: https://github.com/aws-samples/amazon-textract-textractor/issues/274 My code: `from...

The codebase has this line: https://github.com/aws-samples/amazon-textract-textractor/blob/28d6110b08a3584edc4c87022a41d12961b88688/textractor/entities/table.py#L640 to retrieve the cell content. But there's already `cell.text` I tried using `cell.text` but notice it's inaccurate. *Sometimes* it gives an empty string when...

When processing large PDFs, processing the response after Textract has generated it can be noticeably slow. We should profile the response parser to identify the bottlenecks. This seems to be...

enhancement
latency

Encountered this error in several documents because SELECTION_ELEMENT blocks (selection elements inside a table) do not contain the key 'Text'. Noticed that there is an issue with the same problem...