TextRecognitionDataGenerator
TextRecognitionDataGenerator copied to clipboard
Text won't fit on the image
Hi! I'm using TRDG v1.7.0 with Python 3.7.10 and I can't force my text to fit on some of the images. This usually happens with relatively long words like 'Jacksonborough' or any dates (here I tried 1970-01-01). It seems to be mostly random and it's really annoying since I have to manually clean all the output.

Here're the settings I use:
generator = GeneratorFromStrings(blur=1,
random_blur=True,
strings=['1970-01-01'],
skewing_angle=2,
random_skew=True,
background_type=2,
word_split=True,
width=224,
size=64,
fit=True,
alignment=0,
margins=[15, 10, 15, 10],
fonts=fonts,
count=-1
)
Fonts are all stock ones, though I limited them to this list:
fonts = [
r'~/arial.ttf',
'~/arialbd.ttf',
'~/consolab.ttf',
'~/consola.ttf',
'~/bahnschrift.ttf',
'~/Caladea-Bold.ttf',
'~/Caladea-Regular.ttf',
'~/calibrib.ttf',
'~/calibri.ttf',
'~/cambriab.ttf',
'~/Candarab.ttf',
'~/Candara.ttf',
'~/Carlito-Bold.ttf',
'~/Carlito-Regular.ttf',
'~/constanb.ttf',
'~/constan.ttf',
'~/corbelb.ttf',
'~/corbel.ttf',
'~/courbd.ttf',
'~/cour.ttf',
'~/DejaVuSans.ttf',
'~/DejaVuSansCondensed.ttf',
'~/DejaVuSerif.ttf',
'~/DejaVuSerifCondensed.ttf',
'~/ebrimabd.ttf',
'~/ebrima.ttf',
'~/framd.ttf',
'~/georgia.ttf',
'~/micross.ttf',
'~/palab.ttf',
'~/pala.ttf',
'~/segoeui.ttf',
'~/seguisb.ttf',
'~/tahomabd.ttf',
'~/tahoma.ttf',
'~/trebucbd.ttf',
'~/trebuc.ttf',
'~/verdana.ttf'
]
I feel like setting fit to False actually reduces the number of errors with the dates: I got about 6 per 250 images (compare it with every 3rd or 4th image corrupted with this set to True). However, the issue is still there, and it doesn't solve the long words problem.