Output character-level localization
When generating images, it would be interesting to output the bounding box/mask of each character, to train localization models.
There are two possibles implementations:
- Output bounding boxes in a JSON file
- Output a mask file with a specific pixel value for each character
MVP would be for it to work with non-skewed/blurred images, but ideally, it should work for any configuration.
Yep, would be interesting, need help on that ?
I already implemented the first option of outputting a mask with a different pixel value for each characters. It's not ready to merge yet, because of tests and handling the handwritten generators.
I will commit what I have so you can take a look, I also want to implement the format used in this paper if you would like to try it: https://arxiv.org/pdf/1904.01941.pdf
Here is an example of the current output:

It is rather hard to see, but each character's color is treated as a "label" with the RGB value incremented for each character drawn.
Here you'd have (0, 0, 1), (0, 0, 2), ... (0, 0, 255), (0, 1, 0) etc...
To create bounding boxes around each character I think skimage could be used: https://muthu.co/draw-bounding-box-around-contours-skimage/
Branch is https://github.com/Belval/TextRecognitionDataGenerator/tree/output-mask
Branch is now merged in master, masks can be generated by using -om 1

The mask is not really human friendly, it works by setting pixels in increments of one.
Here are the colors for each letter in this case:
a => (0, 0, 1) c => (0, 0, 2) c => (0, 0, 3) u => (0, 0, 4) ...
Giving us a maximum character count of 256³ - 1 or 16777215.
Two things need to be added before the issue can be closed:
- Proper documentation for the feature
- A sample script to convert that format to a more "normal" binary numpy array.
Hi Edouard, Does it now support bounding boxes around characters?
Right now it does not, is there a standard format you would like me to implement?
Right now it does not, is there a standard format you would like me to implement?
Is it possible when the code is writing text on the background image it has to have coordinates, by using those coordinates can we get that bounding box values and save it to txt like yolo format uses. e.g. for test.jpg(containing 2 char)>>test.txt(containing 2 bounding boxex) 12 33 444 555 34 56 67 88
I can do that for sure, I'll tag you when the feature is ready.
I can do that for sure, I'll tag you when the feature is ready.
Thanks