Edouard Belval issues

Results 25 issues of


                                            Edouard Belval

Add docstrings

Add docstrings to TRDG to generate better automated references.

Output character-level localization

When generating images, it would be interesting to output the bounding box/mask of each character, to train localization models. There are two possibles implementations: - Output bounding boxes in a...

enhancement

Wrong aspect ratio on handwritten data

When generating data with: `trdg -c 1 -f 64 -w 5 -hw`, the output data's ratio appears to be slightly off or "squished". ![marshwort Oolitic regalities innovate great-grandniece_0](https://user-images.githubusercontent.com/5399488/77765111-29bb0280-7014-11ea-965a-d2248a3af27d.jpg)

Support arabic and urdu text

Enhancement, but it would be interesting to add support for arabic and hindi scripts. I think adding a new font folder and a new dict for both languages would work.

enhancement

TextDataRecognitionGenerator as a python module

The preferred usage had always been through the CLI. Unfortunately, this approach is not frictionless when used in a real machine learning pipeline that might include data augmentations. The v1...

Using PSM.AUTO_OSD or default doesn't make any difference

Hi, I noticed that the text extracted from an image will be the same regardless of if I use PSM.AUTO_OSD or the default (PSM.AUTO according to the code). Weirder yet,...

api.Recognize(timeout=1000) does not return after 1 second

I came across @sirfz post in https://github.com/sirfz/tesserocr/issues/55 which mentionned that you could now pass a timeout parameter to Recognize. Unfortunately it does not seem to work. ```py with tesserocr.PyTessBaseAPI() as...

Convert code to use functional TensorFlow

This code is relatively old, uses a lot of deprecated APIs, and could use a refactor in order to be maintainable.

refactor

Textractor refactoring

This PR introduces a new way to use Textract and process its output in Python. It provides redesigned APIs for Text, Tables, Forms, Expense and AnalyseID to improve developer productivity,...

JPEG conversion in `analyze_document` significantly impacts table predictions

When obtaining predictions through `analyze_document`, the image is converted to JPEG https://github.com/aws-samples/amazon-textract-textractor/blob/master/textractor/textractor.py#L845. The compression is enough to degrade the table predictions. We should check and keep the format, assuming that...

bug