handwriting-ocr
handwriting-ocr copied to clipboard
How to Train CharClassifier?
Please tell how to train the charClassifier with different data.
I should probably create some documentation in GitHub Wikis.
Anyway, there are two options how to create a dataset for the CharClassifier. Both options uses loadCharsData()
function. This function takes three parameters charloc
, wordloc
, lang
(right now it works with 'cz' or 'en' language).
First Option
Is to create folder and in this folder create folder for each character (I use one additional empty character for wrongly separated letters, folder 0
). In each of these folders are images corresponding to the label of folder. You can see this structure in folder data/charclas/en/
. With data prepared like this, you can set the charloc
parametr to location of the main folder, for example charloc=data/charclas/en/
. If you don't want to use this option, set it to empty string charloc=''
.
Second Option
You can see this structure in folder data/words2/
. In this folder you have images of whole words named as label_timestamp.jpg
and with each of these images comes another file named as labe_timestamp.txt
. The file labe_timestamp.txt
contains array of positions where should be the word split in order to get the letter. For example, a first value in the array is a start of a first letter in an image and a second value is an end of the first letter and a start of second letter. If you don't want to use this option, set it to empty string wordloc=''
.
BTW, all functions loading datasets are located in datahelpers.py.
I hope this helps :smiley:.
Yes. A wiki documentation would be very good. I think in the second option you mean wordloc = ' ' instead of charloc = ' '. Thanks anyways.