handwriting-ocr
handwriting-ocr copied to clipboard
How to train word-CTC with iam dataset?
How to train word-CTC with iam dataset?
I haven't tested it after recent rework, but it should work as follows. Donwload the IAM dataset and place it into data/
folder. Then go into src/data/
folder and run scripts data extractor, normalization and create sets:
python data_extractor.py -d iam
python data_normalization -d iam
python data_create_sets -d iam
After that you should have folder data/sets
containing sets created from IAM dataset. After that go into word_classifier_CTC.ipynb
and change the path of loading words to location of train set.
Hope it works. If there are any issues, let me know. Or you can create pull request fixing them, I would really appreciate it.
@Breta01 , Can you tell me on what datasets the pre-trained model has been trained?
@sai36 Right now it was trained only on my personal dataset (5000 images). It would be great if anybody could help with the training because I have only limited access to some computation clouds.