Martin Schade

Results 20 comments of Martin Schade

Amazon Textract is continuously improved and customer feedback like yours help with that task. Since your post the service has been updated and especially some currency related symbols have improved....

That is a API breaking change. Should have added more initially, was a quick hack to get it out.... Maybe use the trp2.convert_queries_to_list_trp2 (https://github.com/aws-samples/amazon-textract-textractor/blob/d324b360dec724fc40bf46fe9f2441e8e403903f/prettyprinter/textractprettyprinter/t_pretty_print.py#L147) Or we can add another method....

We should add an option to pass in a function that can be used instead of the fixed logic.

Interesting. ID schema is specific for ID documents, the generic Textract APIs AnalyzeDocument, DetectDocumentText (and their async Start* and Get*) allow for very flexible definition of documents, which is covered...

There have been some changes, a BaseBlock was introduced, the TextType is a general property not limited to WORD. I could add that, but would close this PR or you...

Ruby support would be cool. Can you add some tests and a README?

Sorry for the late response. Could you post a sample image to test?

I just ran a test on the document you linked, an 80 page 'Infrastructure Funding and Financing Bill' through Textract and got 31856 words identified, which seems to cover the...

blast from the past... The ```find_phrase_in_lines``` https://github.com/aws-samples/amazon-textract-textractor/blob/4b1e55426fc7fa623afcf210a2e3f5b51edc614c/tpipelinegeofinder/textractgeofinder/tgeofinder.py#L841 was my first implementation to find a phrase and essentially is replaced by ```find_phrase_on_page``` https://github.com/aws-samples/amazon-textract-textractor/blob/4b1e55426fc7fa623afcf210a2e3f5b51edc614c/tpipelinegeofinder/textractgeofinder/tgeofinder.py#L769 I see find_intersect_value still uses the "lines" one...