shainet icon indicating copy to clipboard operation
shainet copied to clipboard

OCR recognise image on image

Open fab1an2 opened this issue 4 years ago • 4 comments

How creating captcha breaker? For example : aaaaa

  1. How creating more output than one. I need 8 output.
  2. Meybe using found data on data, similar yolo

fab1an2 avatar Oct 23 '19 09:10 fab1an2

@fab1an2 I'll be happy to help, can you please elaborate on what you mean?

  1. Creating an output of 8, you mean a single layer of 8 neurons or 8 separate layers?
  2. What do you mean by found data?

ArtLinkov avatar Oct 23 '19 10:10 ArtLinkov

@fab1an2 you'll probably need to define an archtecture for your network (or use one previously proposed) and define/train the model using shainet.

A common approach is to separate the characters on the image as a preprocessing and then predict one by one, which is easy to train since you'll need an anotated dataset.

I'm not very familiar with captcha breakers/OCRs, but maybe you could create a CNN -> LSTM like is done for some image captioning models.

hugoabonizio avatar Oct 23 '19 18:10 hugoabonizio

separating are impossible, many chars are on other chars. aaa2wwww I need learn whole chars.

  1. output must be more than 8 neurons. Look one char are from alphabet 25 different signs. Layers are not importantant. I ask only on output. Byt output must be more complikated.
  2. found data = found image on image recognise image on bigest image

fab1an2 avatar Oct 24 '19 09:10 fab1an2

I see, well in that case there are a few things to consider:

For image recognition, it is best to use a CNN to identify the chars, as @hugoabonizio mentioned. To chose the output layer size, simply define the last fully-connected layer (the one before the soft-max) to the size you want. Example: cnn.add_fconnect(l_size: 8, activation_function: SHAInet.sigmoid)

Now, in this specific case, your output needs to be 25 neurons, for each possible char. That is because when training you must give the NN an error to update its internal parameters for each guess it does, and 25 errors per single char guess makes it much faster to train. However, this is only for a single char recognition, so you still need to deal with the fact that there are 8 chars per image. You might employ different tactics to deal with this problem, every solution has its pro & cons but you can take the main ideas and combine them into something else, here are a few examples:

  • Create an output layer with 25x8 neurons, and associate each 25 with the corresponding location identification.
    • Pros: Simple to implement NN
    • Cons: Hard to deal with chars that are not located in the same place
  • Separate the chars into smaller images prior to feeding them into the model, either by training a different NN to do that or by some other means like more classic edge-detection methods.
    • Pros: Deals with location issue of the chars
    • Cons: Capturing the chars correctly is a challenge in itself, makes the model more complex to implement, labeling needs to be adjusted in the original data
  • Treat the entire sequence as a single output and have a neuron to represent each possible permutation, which basically means to have an output layer of 10518300 neurons
    • Pros: Looking at the entire image as a single input and "reading it" as such, the model is relatively simple
    • Cons: HUGE computation power (both CPU & Memory) needed to train & run

I hope this gives you some ideas :)

ArtLinkov avatar Oct 24 '19 11:10 ArtLinkov