TextRecognitionDataGenerator Output character-level localization

When generating images, it would be interesting to output the bounding box/mask of each character, to train localization models.

There are two possibles implementations:

Output bounding boxes in a JSON file
Output a mask file with a specific pixel value for each character

MVP would be for it to work with non-skewed/blurred images, but ideally, it should work for any configuration.

Nov 30 '19 23:11 Belval

Yep, would be interesting, need help on that ?

Dec 05 '19 04:12 Ownmarc

I already implemented the first option of outputting a mask with a different pixel value for each characters. It's not ready to merge yet, because of tests and handling the handwritten generators.

I will commit what I have so you can take a look, I also want to implement the format used in this paper if you would like to try it: https://arxiv.org/pdf/1904.01941.pdf

Here is an example of the current output:

outwriggled three-in-hand long-standing Arvida deaccessioning_0

It is rather hard to see, but each character's color is treated as a "label" with the RGB value incremented for each character drawn.

Here you'd have (0, 0, 1), (0, 0, 2), ... (0, 0, 255), (0, 1, 0) etc...

To create bounding boxes around each character I think skimage could be used: https://muthu.co/draw-bounding-box-around-contours-skimage/

Dec 05 '19 14:12 Belval

Branch is https://github.com/Belval/TextRecognitionDataGenerator/tree/output-mask

Dec 05 '19 14:12 Belval

Branch is now merged in master, masks can be generated by using -om 1

acculturation_1 acculturation_1_mask

The mask is not really human friendly, it works by setting pixels in increments of one.

Here are the colors for each letter in this case:

a => (0, 0, 1) c => (0, 0, 2) c => (0, 0, 3) u => (0, 0, 4) ...

Giving us a maximum character count of 256³ - 1 or 16777215.

Two things need to be added before the issue can be closed:

Proper documentation for the feature
A sample script to convert that format to a more "normal" binary numpy array.

Jan 02 '20 19:01 Belval

Hi Edouard, Does it now support bounding boxes around characters?

Jan 15 '20 02:01 KhanhCon

Right now it does not, is there a standard format you would like me to implement?

Jan 17 '20 13:01 Belval

Right now it does not, is there a standard format you would like me to implement?

Is it possible when the code is writing text on the background image it has to have coordinates, by using those coordinates can we get that bounding box values and save it to txt like yolo format uses. e.g. for test.jpg(containing 2 char)>>test.txt(containing 2 bounding boxex) 12 33 444 555 34 56 67 88

Apr 30 '20 11:04 yyyash8

I can do that for sure, I'll tag you when the feature is ready.

Apr 30 '20 14:04 Belval

I can do that for sure, I'll tag you when the feature is ready.

Thanks

Apr 30 '20 14:04 yyyash8