TextRecognitionDataGenerator icon indicating copy to clipboard operation
TextRecognitionDataGenerator copied to clipboard

Text won't fit on the image

Open DLumi opened this issue 4 years ago • 1 comments

Hi! I'm using TRDG v1.7.0 with Python 3.7.10 and I can't force my text to fit on some of the images. This usually happens with relatively long words like 'Jacksonborough' or any dates (here I tried 1970-01-01). It seems to be mostly random and it's really annoying since I have to manually clean all the output.

image image

Here're the settings I use:

generator = GeneratorFromStrings(blur=1,
                                 random_blur=True,
                                 strings=['1970-01-01'],
                                 skewing_angle=2,
                                 random_skew=True,
                                 background_type=2,
                                 word_split=True,
                                 width=224,
                                 size=64,
                                 fit=True,
                                 alignment=0,
                                 margins=[15, 10, 15, 10],
                                 fonts=fonts,
                                 count=-1
                                 )

Fonts are all stock ones, though I limited them to this list:

fonts = [
    r'~/arial.ttf',
    '~/arialbd.ttf',
    '~/consolab.ttf',
    '~/consola.ttf',
    '~/bahnschrift.ttf',
    '~/Caladea-Bold.ttf',
    '~/Caladea-Regular.ttf',
    '~/calibrib.ttf',
    '~/calibri.ttf',
    '~/cambriab.ttf',
    '~/Candarab.ttf',
    '~/Candara.ttf',
    '~/Carlito-Bold.ttf',
    '~/Carlito-Regular.ttf',
    '~/constanb.ttf',
    '~/constan.ttf',
    '~/corbelb.ttf',
    '~/corbel.ttf',
    '~/courbd.ttf',
    '~/cour.ttf',
    '~/DejaVuSans.ttf',
    '~/DejaVuSansCondensed.ttf',
    '~/DejaVuSerif.ttf',
    '~/DejaVuSerifCondensed.ttf',
    '~/ebrimabd.ttf',
    '~/ebrima.ttf',
    '~/framd.ttf',
    '~/georgia.ttf',
    '~/micross.ttf',
    '~/palab.ttf',
    '~/pala.ttf',
    '~/segoeui.ttf',
    '~/seguisb.ttf',
    '~/tahomabd.ttf',
    '~/tahoma.ttf',
    '~/trebucbd.ttf',
    '~/trebuc.ttf',
    '~/verdana.ttf'
]

DLumi avatar Jul 14 '21 10:07 DLumi

I feel like setting fit to False actually reduces the number of errors with the dates: I got about 6 per 250 images (compare it with every 3rd or 4th image corrupted with this set to True). However, the issue is still there, and it doesn't solve the long words problem.

DLumi avatar Jul 14 '21 11:07 DLumi