handwriting-ocr icon indicating copy to clipboard operation
handwriting-ocr copied to clipboard

How to Train CharClassifier?

Open yasersakkaf opened this issue 6 years ago • 2 comments

Please tell how to train the charClassifier with different data.

yasersakkaf avatar Feb 27 '18 12:02 yasersakkaf

I should probably create some documentation in GitHub Wikis.

Anyway, there are two options how to create a dataset for the CharClassifier. Both options uses loadCharsData() function. This function takes three parameters charloc, wordloc, lang (right now it works with 'cz' or 'en' language).

First Option

Is to create folder and in this folder create folder for each character (I use one additional empty character for wrongly separated letters, folder 0). In each of these folders are images corresponding to the label of folder. You can see this structure in folder data/charclas/en/. With data prepared like this, you can set the charloc parametr to location of the main folder, for example charloc=data/charclas/en/. If you don't want to use this option, set it to empty string charloc=''.

Second Option

You can see this structure in folder data/words2/. In this folder you have images of whole words named as label_timestamp.jpg and with each of these images comes another file named as labe_timestamp.txt. The file labe_timestamp.txt contains array of positions where should be the word split in order to get the letter. For example, a first value in the array is a start of a first letter in an image and a second value is an end of the first letter and a start of second letter. If you don't want to use this option, set it to empty string wordloc=''.

BTW, all functions loading datasets are located in datahelpers.py.

I hope this helps :smiley:.

Breta01 avatar Feb 27 '18 14:02 Breta01

Yes. A wiki documentation would be very good. I think in the second option you mean wordloc = ' ' instead of charloc = ' '. Thanks anyways.

yasersakkaf avatar Feb 28 '18 06:02 yasersakkaf