TextRecognitionDataGenerator icon indicating copy to clipboard operation
TextRecognitionDataGenerator copied to clipboard

Output character-level localization

Open Belval opened this issue 6 years ago • 11 comments

When generating images, it would be interesting to output the bounding box/mask of each character, to train localization models.

There are two possibles implementations:

  • Output bounding boxes in a JSON file
  • Output a mask file with a specific pixel value for each character

MVP would be for it to work with non-skewed/blurred images, but ideally, it should work for any configuration.

Belval avatar Nov 30 '19 23:11 Belval

Yep, would be interesting, need help on that ?

Ownmarc avatar Dec 05 '19 04:12 Ownmarc

I already implemented the first option of outputting a mask with a different pixel value for each characters. It's not ready to merge yet, because of tests and handling the handwritten generators.

I will commit what I have so you can take a look, I also want to implement the format used in this paper if you would like to try it: https://arxiv.org/pdf/1904.01941.pdf

Here is an example of the current output:

outwriggled three-in-hand long-standing Arvida deaccessioning_0 outwriggled three-in-hand long-standing Arvida deaccessioning_0_mask

It is rather hard to see, but each character's color is treated as a "label" with the RGB value incremented for each character drawn.

Here you'd have (0, 0, 1), (0, 0, 2), ... (0, 0, 255), (0, 1, 0) etc...

To create bounding boxes around each character I think skimage could be used: https://muthu.co/draw-bounding-box-around-contours-skimage/

Belval avatar Dec 05 '19 14:12 Belval

Branch is https://github.com/Belval/TextRecognitionDataGenerator/tree/output-mask

Belval avatar Dec 05 '19 14:12 Belval

Branch is now merged in master, masks can be generated by using -om 1

acculturation_1 acculturation_1_mask

The mask is not really human friendly, it works by setting pixels in increments of one.

Here are the colors for each letter in this case:

a => (0, 0, 1) c => (0, 0, 2) c => (0, 0, 3) u => (0, 0, 4) ...

Giving us a maximum character count of 256³ - 1 or 16777215.

Two things need to be added before the issue can be closed:

  • Proper documentation for the feature
  • A sample script to convert that format to a more "normal" binary numpy array.

Belval avatar Jan 02 '20 19:01 Belval

Hi Edouard, Does it now support bounding boxes around characters?

KhanhCon avatar Jan 15 '20 02:01 KhanhCon

Right now it does not, is there a standard format you would like me to implement?

Belval avatar Jan 17 '20 13:01 Belval

Right now it does not, is there a standard format you would like me to implement?

Is it possible when the code is writing text on the background image it has to have coordinates, by using those coordinates can we get that bounding box values and save it to txt like yolo format uses. e.g. for test.jpg(containing 2 char)>>test.txt(containing 2 bounding boxex) 12 33 444 555 34 56 67 88

yyyash8 avatar Apr 30 '20 11:04 yyyash8

I can do that for sure, I'll tag you when the feature is ready.

Belval avatar Apr 30 '20 14:04 Belval

I can do that for sure, I'll tag you when the feature is ready.

Thanks

yyyash8 avatar Apr 30 '20 14:04 yyyash8