Edouard Belval comments

Results 293 comments of


                                            Edouard Belval

Analyze a document with multiple pages

Hi, You will have to use the asynchronous API. It is very similar to the synchronous API, except that the PDF file needs to be in S3 or you can...

For textractor.entities.line.Line - visualize() breaks

I was not able to reproduce this issue with our internal samples, if you can share the Textract response or original asset necessary to reproduce this issue I can look...

Caller: allow early return when job incomplete

This is something we could accept a PR for. I think it could be implemented as `extractor.get_status(job_id)` which returns a value from an enum defined in https://github.com/aws-samples/amazon-textract-textractor/blob/master/textractor/data/constants.py with `IN_PROGRESS`, `SUCCEEDED`,...

issue with ordering in extractions, markdown and gettext methods

I will test it first but this looks like a known issue that happens when the LAYOUT predictions do not match the TABLE predictions, causing the reading order to be...

issue with ordering in extractions, markdown and gettext methods

What version of `amazon-textract-textractor` are you using? With 1.8.2 I get: ``` Page 2 of 10 Schneider Electric South East Asia (HQ) Pte. Ltd. Schneider Electric Overseas Asia Pte Ltd...

issue with ordering in extractions, markdown and gettext methods

Thank you for clarifying and sharing the file, I will attempt to reproduce the issue.

issue with ordering in extractions, markdown and gettext methods

We have a fix for this issue that will be included into the 1.8.6 version. It should be available by March 7th.

issue with ordering in extractions, markdown and gettext methods

Should be fixed in 1.9.0, let me know if that addresses your issue. The tables are not insert correctly in the output. Note that this will only fix the insertion...

issue with ordering in extractions, markdown and gettext methods

I will leave the issue open until you can confirm that this is fixed.

issue with ordering in extractions, markdown and gettext methods

Thank for the heads up. 1.9.0 should be in PyPI now. Note that it can take 1-2 hours for their cache to refresh. See: https://github.com/aws-samples/amazon-textract-textractor/actions/runs/13792246430