handwriting-ocr icon indicating copy to clipboard operation
handwriting-ocr copied to clipboard

How to train word-CTC with iam dataset?

Open ayush7506 opened this issue 5 years ago • 4 comments

ayush7506 avatar Dec 12 '18 12:12 ayush7506

How to train word-CTC with iam dataset?

ayush7506 avatar Dec 12 '18 12:12 ayush7506

I haven't tested it after recent rework, but it should work as follows. Donwload the IAM dataset and place it into data/ folder. Then go into src/data/ folder and run scripts data extractor, normalization and create sets:

python data_extractor.py -d iam
python data_normalization -d iam
python data_create_sets -d iam

After that you should have folder data/sets containing sets created from IAM dataset. After that go into word_classifier_CTC.ipynb and change the path of loading words to location of train set.

Hope it works. If there are any issues, let me know. Or you can create pull request fixing them, I would really appreciate it.

Breta01 avatar Dec 12 '18 13:12 Breta01

@Breta01 , Can you tell me on what datasets the pre-trained model has been trained?

sai36 avatar Dec 13 '18 14:12 sai36

@sai36 Right now it was trained only on my personal dataset (5000 images). It would be great if anybody could help with the training because I have only limited access to some computation clouds.

Breta01 avatar Dec 13 '18 22:12 Breta01