SwiftOCR
SwiftOCR copied to clipboard
Optimize Training parameters
Hi,
I just came across your project and it is awesome! 👍
I have worked with some Deep NN and I know it is better to try different parameters (learning rate, momentum, etc.) to get better results and provide the best trained network (using the test group) to the user. I have looked at your training sample and I realized you simply set the parameters to fixed ones (0.7 learning rate , 0.4 momentum ...) and you have a fixed 1-hidden layer size.
I do not know the algorithm you use to recognize the texts (BTW: Could you provide a source for the algorithm so I read about it a bit?) but I would like to try a bunch of parameters to see if I can improve the obtained weights of the neural network, but the problem is I do not find much documentation on your project. Could you provide me some insights about how the algorithm works, why you use that learning rate, momentum, etc?
Thanks!
Hi there,
Small question. What do you mean by "the algorithm you use to recognize the texts"? Do you mean how I separate the characters, how the NN works or something else?
Wow, Thanks for such a fast answer!
I mean how are you separating characters and what are you passing to the input of the NN. I would like to know which components can I tweak to get better character recognition and, for instance, if a learning rate of 0.1 makes sense or not.
I'm using the Swift-AI framework for the NN and a Connected-component labeling algorithm for getting the bounding boxes of the characters.
I have looked at your training sample and I realized you simply set the parameters to fixed ones (0.7 learning rate , 0.4 momentum ...) and you have a fixed 1-hidden layer size.
Since this was my first time working with Neural Networks, I searched the Internet for what learning rate and momentum I should use ^^ I would have loved to use a NN library that allows more than one hidden layer but I couldn't find one.
Parameters to tweak in SwiftOCR.swift:
-
recognizableCharacters
(Which characters were used / are used for training the NN (see #34)) -
globalNetwork: hidden, learningRate, momentum, activationFunction and errorFunction
(This only has an affect on training. I only achieved good training results when using.CrossEntropy(average: false)
as the errorFunction. If you use the Training App, you have to change them on line 32 and 97) -
xMergeRadius and yMergeRadius
(see #1) -
confidenceThreshold
(The confidence for recognizing a character has to be bigger or equal to this threshold. If it is too high, it may filter too much 'noise', if it is too low, if may not filter enough.) -
//Filter blobs
(line 347 - 360) (This filters the connected components. E.g. If the blob is thinner than 1% of the input image width, thennotToThin
will befalse
and the blob will get filtered out.) -
//Filter rects: - Not to small
(line 417 - 429) (the same as//Filter blobs
but only checks if the width and height of the blob (after merging) is OK) -
cropSize
(line 469) (How big the final image (of the blob) should be for recognition. If you change the cropSize, you have change the number of inputs of the NN tocropSize.width * cropSize.height + 1
)
Parameters to tweak in SwiftOCRTraining.swift (Only for training):
-
trainingImageNames
(These images will be used for adding noise in the background when training) -
trainingFontNames
(The font names used tor training. Only important when you aren't using the Training app) -
numberOfTrainImages and numberOfTestImages
(how large the training and testing set should be) -
errorThreshold
(When it should stop training. Only kind of important when you aren't using the Training app) -
//Distortions
(line 236 - 246) (CGAffineTransform: How much the image should get distorted for training)
SwiftOCRDelegate:
-
func preprocessImageForOCR(inputImage: OCRImage) -> OCRImage?
(Custom image preprocessing)
I think that for the beginning this is more than enough parameters to fiddle with.
Wow! Thanks! I think you might put that on the readme or the wiki of the library. Some people might find it really interesting.
What it is usually done when trying to get the best NN is to train tons of them in parallel with different settings to check which one works best. then you use the best settings to train your network. That is why I asked you for this. 👍
@garnele007 Does errorThreshold increase in accuracy when you provide a higher or lower number?
@RollingGoron The lower the number, the more accurate (and time-consuming) the training should get.