Edouard Belval

Results 25 issues of Edouard Belval

Add docstrings to TRDG to generate better automated references.

When generating images, it would be interesting to output the bounding box/mask of each character, to train localization models. There are two possibles implementations: - Output bounding boxes in a...

enhancement

When generating data with: `trdg -c 1 -f 64 -w 5 -hw`, the output data's ratio appears to be slightly off or "squished". ![marshwort Oolitic regalities innovate great-grandniece_0](https://user-images.githubusercontent.com/5399488/77765111-29bb0280-7014-11ea-965a-d2248a3af27d.jpg)

Enhancement, but it would be interesting to add support for arabic and hindi scripts. I think adding a new font folder and a new dict for both languages would work.

enhancement

The preferred usage had always been through the CLI. Unfortunately, this approach is not frictionless when used in a real machine learning pipeline that might include data augmentations. The v1...

Hi, I noticed that the text extracted from an image will be the same regardless of if I use PSM.AUTO_OSD or the default (PSM.AUTO according to the code). Weirder yet,...

I came across @sirfz post in https://github.com/sirfz/tesserocr/issues/55 which mentionned that you could now pass a timeout parameter to Recognize. Unfortunately it does not seem to work. ```py with tesserocr.PyTessBaseAPI() as...

This code is relatively old, uses a lot of deprecated APIs, and could use a refactor in order to be maintainable.

refactor

This PR introduces a new way to use Textract and process its output in Python. It provides redesigned APIs for Text, Tables, Forms, Expense and AnalyseID to improve developer productivity,...

When obtaining predictions through `analyze_document`, the image is converted to JPEG https://github.com/aws-samples/amazon-textract-textractor/blob/master/textractor/textractor.py#L845. The compression is enough to degrade the table predictions. We should check and keep the format, assuming that...

bug