handwriting-ocr
handwriting-ocr copied to clipboard
train a new model
How to train a new model with new data set
Nice question. I am wondering the same. Please tell me how to train the model?
Hi,
This is question depends on the model you want to train. All the notebooks that are for model training contain name Classifier. These notebooks load data from the data folder (if you didn't already, you have to download the data from provided URL), process them and train the model which is then saved in the models folder.
You don't have to do much more than replace the original date with yours and train the model. Your data have to be in the right format which depends on the type of a model.
Often the data are stored as and image file with name in the format: label_timestamp.jpg.
If you need more details, please specify the model you want to train.
I wanted to train the word-classifier CTC. How to do it?
OK, that's the easy one.
The training code is in this notebook: WordClassifier-CTC.ipynb. Currently, the data are loaded from folder data/words2/ (the location is parametr of loadWordsData()). In this folder I have images of words which are already normalized (grayscaled and with height: 60px). The words' images are named as word_timestamp.jpg (word stands for correct label and timestamp can be just random number).
For example, following image is named as sell_15132719.jpg:

The loadWordsData() loads grayscaled images and outputs numpy array of images and labels. The model is then trained and output into location defined by save_location variable.
I hope this helps.
What are the .txt files in data/words2? I am going to retrain the char classifier and it needs the .txt files. How can I generate .txt files for my data?
This question is duplicate with #44