amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Visualizing words with search_words shows wrong results
In the documentation, this example: https://aws-samples.github.io/amazon-textract-textractor/notebooks/visualizing_results.html#Visualizing-the-result-of-a-search does not generate the right output.
Expected:

Result:

This occurs when torch is not installed (but might occur when it is installed as well).
the word/line similarity code is definitely buggy. Would be nice to pass in distance metrics, e.g. from the textdistance package: https://pypi.org/project/textdistance/