How to generate Non-Broken bengali synthetic data for text recognition?
thank you for your awesome work. i tried to generate bangla synthetic data using this library by following this tutorials : https://textrecognitiondatagenerator.readthedocs.io/en/latest/index.html
first i created virtual env like this : conda create -n trdg python=3.7 anaconda
then after conda activate trdg
i tried to build from source like this : git clone https://github.com/Belval/TextRecognitionDataGenerator
pip3 install -r requirements.txt
then i have collected few bangla .ttf fonts from here : https://www.omicronlab.com/bangla-fonts.html and stored them inside fonts/bn folder.
then when i try to generate data like this :
python3 run.py -c 100 -l bn -f 128
i face problems exactly like this one : https://github.com/python-pillow/Pillow/issues/3593#issuecomment-455725034
for example, this bangla word 'দৃষ্টিভঙ্গি' gets broken into something weird like this one :
https://user-images.githubusercontent.com/11752205/51385490-a4071180-1b49-11e9-9a8e-bc60f6eff959.png
how do i solve this complex layout issue during text to image conversion(specially for bangla) while using your library? thanks in advance