Edouard Belval
Edouard Belval
Add docstrings to TRDG to generate better automated references.
When generating images, it would be interesting to output the bounding box/mask of each character, to train localization models. There are two possibles implementations: - Output bounding boxes in a...
When generating data with: `trdg -c 1 -f 64 -w 5 -hw`, the output data's ratio appears to be slightly off or "squished". 
Enhancement, but it would be interesting to add support for arabic and hindi scripts. I think adding a new font folder and a new dict for both languages would work.
The preferred usage had always been through the CLI. Unfortunately, this approach is not frictionless when used in a real machine learning pipeline that might include data augmentations. The v1...
Hi, I noticed that the text extracted from an image will be the same regardless of if I use PSM.AUTO_OSD or the default (PSM.AUTO according to the code). Weirder yet,...
I came across @sirfz post in https://github.com/sirfz/tesserocr/issues/55 which mentionned that you could now pass a timeout parameter to Recognize. Unfortunately it does not seem to work. ```py with tesserocr.PyTessBaseAPI() as...
This code is relatively old, uses a lot of deprecated APIs, and could use a refactor in order to be maintainable.
This PR introduces a new way to use Textract and process its output in Python. It provides redesigned APIs for Text, Tables, Forms, Expense and AnalyseID to improve developer productivity,...
When obtaining predictions through `analyze_document`, the image is converted to JPEG https://github.com/aws-samples/amazon-textract-textractor/blob/master/textractor/textractor.py#L845. The compression is enough to degrade the table predictions. We should check and keep the format, assuming that...