Edouard Belval
Edouard Belval
When processing large PDFs, processing the response after Textract has generated it can be noticeably slow. We should profile the response parser to identify the bottlenecks. This seems to be...
*Issue #, if available:* #316 (related) *Description of changes:* This PR would add make the PDF to image conversion lazy (only runs when a user accesses the image) to avoid...
Currently a user that does not have `pdf2image` installed will see an exception when calling `analyze_document` with a PDF. This is problematic because Textract does support single-page PDF as input.
In the documentation, this example: https://aws-samples.github.io/amazon-textract-textractor/notebooks/visualizing_results.html#Visualizing-the-result-of-a-search does not generate the right output. Expected:  Result:  This occurs when torch is not installed (but might occur when it is installed...
Issue introduced by #197. `get_cells_by_type()` only supports/returns column headers. https://github.com/aws-samples/amazon-textract-textractor/blob/master/textractor/entities/table.py#L294
*Issue #, if available:* #170 *Description of changes:* The original issue was that word and line bounding boxes were shifted in some cases when page width or page height !=...
When exporting the API output of Textract Tables, an error will be shown when opening the resulting `.xlsx` file in Microsoft Excel.
https://github.com/aws-samples/amazon-textract-textractor/issues/134 was merged and the underlying caller now support AnalyzeLending. We need to add it to Textractor.
Recent issues such as #121 #122 #123 showed that our current test suite is inadequate and several edge cases are missing. This issue aims to outline a plan for implementing...
### Discussed in https://github.com/aws-samples/amazon-textract-textractor/discussions/350 Originally posted by **samwhealon** April 11, 2024 I have been playing around with this library and the original textract-response-parser. I found that TRP doesn't support returning...