SwiftOCR icon indicating copy to clipboard operation
SwiftOCR copied to clipboard

Optimize Training parameters

Open pabloromeu opened this issue 7 years ago • 6 comments

Hi,

I just came across your project and it is awesome! 👍

I have worked with some Deep NN and I know it is better to try different parameters (learning rate, momentum, etc.) to get better results and provide the best trained network (using the test group) to the user. I have looked at your training sample and I realized you simply set the parameters to fixed ones (0.7 learning rate , 0.4 momentum ...) and you have a fixed 1-hidden layer size.

I do not know the algorithm you use to recognize the texts (BTW: Could you provide a source for the algorithm so I read about it a bit?) but I would like to try a bunch of parameters to see if I can improve the obtained weights of the neural network, but the problem is I do not find much documentation on your project. Could you provide me some insights about how the algorithm works, why you use that learning rate, momentum, etc?

Thanks!

pabloromeu avatar Jul 23 '16 12:07 pabloromeu

Hi there,

Small question. What do you mean by "the algorithm you use to recognize the texts"? Do you mean how I separate the characters, how the NN works or something else?

NMAC427 avatar Jul 23 '16 12:07 NMAC427

Wow, Thanks for such a fast answer!

I mean how are you separating characters and what are you passing to the input of the NN. I would like to know which components can I tweak to get better character recognition and, for instance, if a learning rate of 0.1 makes sense or not.

pabloromeu avatar Jul 23 '16 12:07 pabloromeu

I'm using the Swift-AI framework for the NN and a Connected-component labeling algorithm for getting the bounding boxes of the characters.

I have looked at your training sample and I realized you simply set the parameters to fixed ones (0.7 learning rate , 0.4 momentum ...) and you have a fixed 1-hidden layer size.

Since this was my first time working with Neural Networks, I searched the Internet for what learning rate and momentum I should use ^^ I would have loved to use a NN library that allows more than one hidden layer but I couldn't find one.

Parameters to tweak in SwiftOCR.swift:

  • recognizableCharacters (Which characters were used / are used for training the NN (see #34))
  • globalNetwork: hidden, learningRate, momentum, activationFunction and errorFunction (This only has an affect on training. I only achieved good training results when using .CrossEntropy(average: false) as the errorFunction. If you use the Training App, you have to change them on line 32 and 97)
  • xMergeRadius and yMergeRadius (see #1)
  • confidenceThreshold (The confidence for recognizing a character has to be bigger or equal to this threshold. If it is too high, it may filter too much 'noise', if it is too low, if may not filter enough.)
  • //Filter blobs (line 347 - 360) (This filters the connected components. E.g. If the blob is thinner than 1% of the input image width, then notToThin will be false and the blob will get filtered out.)
  • //Filter rects: - Not to small (line 417 - 429) (the same as //Filter blobs but only checks if the width and height of the blob (after merging) is OK)
  • cropSize (line 469) (How big the final image (of the blob) should be for recognition. If you change the cropSize, you have change the number of inputs of the NN to cropSize.width * cropSize.height + 1)

Parameters to tweak in SwiftOCRTraining.swift (Only for training):

  • trainingImageNames (These images will be used for adding noise in the background when training)
  • trainingFontNames (The font names used tor training. Only important when you aren't using the Training app)
  • numberOfTrainImages and numberOfTestImages (how large the training and testing set should be)
  • errorThreshold (When it should stop training. Only kind of important when you aren't using the Training app)
  • //Distortions (line 236 - 246) (CGAffineTransform: How much the image should get distorted for training)

SwiftOCRDelegate:

  • func preprocessImageForOCR(inputImage: OCRImage) -> OCRImage? (Custom image preprocessing)

I think that for the beginning this is more than enough parameters to fiddle with.

NMAC427 avatar Jul 23 '16 13:07 NMAC427

Wow! Thanks! I think you might put that on the readme or the wiki of the library. Some people might find it really interesting.

What it is usually done when trying to get the best NN is to train tons of them in parallel with different settings to check which one works best. then you use the best settings to train your network. That is why I asked you for this. 👍

pabloromeu avatar Jul 23 '16 13:07 pabloromeu

@garnele007 Does errorThreshold increase in accuracy when you provide a higher or lower number?

RollingGoron avatar Sep 05 '16 17:09 RollingGoron

@RollingGoron The lower the number, the more accurate (and time-consuming) the training should get.

NMAC427 avatar Sep 06 '16 04:09 NMAC427