handwriting-ocr How to train word-CTC with iam dataset?

How to train word-CTC with iam dataset?

Open ayush7506 opened this issue 5 years ago • 4 comments

Dec 12 '18 12:12 ayush7506

How to train word-CTC with iam dataset?

Dec 12 '18 12:12 ayush7506

I haven't tested it after recent rework, but it should work as follows. Donwload the IAM dataset and place it into data/ folder. Then go into src/data/ folder and run scripts data extractor, normalization and create sets:

python data_extractor.py -d iam
python data_normalization -d iam
python data_create_sets -d iam

After that you should have folder data/sets containing sets created from IAM dataset. After that go into word_classifier_CTC.ipynb and change the path of loading words to location of train set.

Hope it works. If there are any issues, let me know. Or you can create pull request fixing them, I would really appreciate it.

Dec 12 '18 13:12 Breta01

@Breta01 , Can you tell me on what datasets the pre-trained model has been trained?

Dec 13 '18 14:12 sai36

@sai36 Right now it was trained only on my personal dataset (5000 images). It would be great if anybody could help with the training because I have only limited access to some computation clouds.

Dec 13 '18 22:12 Breta01

handwriting-ocr handwriting-ocr copied to clipboard

How to train word-CTC with iam dataset?

handwriting-ocr
handwriting-ocr copied to clipboard