Edouard Belval issues

Results 25 issues of


                                            Edouard Belval

Large PDF response processing is slow

When processing large PDFs, processing the response after Textract has generated it can be noticeably slow. We should profile the response parser to identify the bottlenecks. This seems to be...

enhancement

latency

Add LazyObject to lazy load pdf to image conversion

*Issue #, if available:* #316 (related) *Description of changes:* This PR would add make the PDF to image conversion lazy (only runs when a user accesses the image) to avoid...

Downgrade exception to warning when passing PDF to analyze_document

Currently a user that does not have `pdf2image` installed will see an exception when calling `analyze_document` with a PDF. This is problematic because Textract does support single-page PDF as input.

Visualizing words with search_words shows wrong results

In the documentation, this example: https://aws-samples.github.io/amazon-textract-textractor/notebooks/visualizing_results.html#Visualizing-the-result-of-a-search does not generate the right output. Expected: ![image](https://user-images.githubusercontent.com/5399488/230956724-32d8c1e6-9d1d-4377-81f8-5a10b70fa411.png) Result: ![image](https://user-images.githubusercontent.com/5399488/230956843-45468af6-dc9b-467d-9b8a-c90032778de7.png) This occurs when torch is not installed (but might occur when it is installed...

bug

need repro

Table.get_cells_by_type() only supports column header

Issue introduced by #197. `get_cells_by_type()` only supports/returns column headers. https://github.com/aws-samples/amazon-textract-textractor/blob/master/textractor/entities/table.py#L294

bug

Add Page to DocumentEntity

*Issue #, if available:* #170 *Description of changes:* The original issue was that word and line bounding boxes were shifted in some cases when page width or page height !=...

Tables exported to Excel with Textractor display an error when opened in Excel

When exporting the API output of Textract Tables, an error will be shown when opening the resulting `.xlsx` file in Microsoft Excel.

bug

Add support for AnalyzeLending in Textractor

https://github.com/aws-samples/amazon-textract-textractor/issues/134 was merged and the underlying caller now support AnalyzeLending. We need to add it to Textractor.

enhancement

Implement smoke tests

Recent issues such as #121 #122 #123 showed that our current test suite is inadequate and several edge cases are missing. This issue aims to outline a plan for implementing...

enhancement

Python Support for Column Headers

### Discussed in https://github.com/aws-samples/amazon-textract-textractor/discussions/350 Originally posted by **samwhealon** April 11, 2024 I have been playing around with this library and the original textract-response-parser. I found that TRP doesn't support returning...