reading-text-in-the-wild
reading-text-in-the-wild copied to clipboard
Very poor performance on natural text images? Am I missing something
I observe that CHAR2 and DICT models do very poorly when reading text from real world images. Even the given examples (easy ones with plain background) weren't read well by the model. Also to mention it did not read any text correctly from any natural scene text images.
Are the models trained fully?
Hi Irtza. The models were not 'trained' via these scripts. The weights were taken from the matlab models from the original Jaderberg paper.
The network was trained with images of text that were cropped to only include text. The upstream components of the pipeline are not included with this repo, so you will have to implement those (or a variant thereof) yourself.
The function expects cropped images of text of size 28x100 (and the code resizes the image to this shape if needed).
Are the natural images you are providing to the function already cropped?
Thanks for getting back Dan,
I have fed the model with ~32x100, a few natural test text images. and the output was not satisfactory. after preprocessing and contrast adjustments it gets a few letters right.
Tested The included examples, on CHAR2 model the results I get are as follows.
CondoleezzaRice.jpg image outputs -> "CONDEEEEAAAIE. " CMA_CGM.jpg image outputs -> "COACGOE."
Have I been able to reproduce the results you had ? If not, what could be the problem? I had not changed anything when I was using the example input images provided. Can you share your results/benchmarks ?
Not having matlab encouraged me to look into your project. and I want to improve and work on the other parts of the pipeline as-well. Can we talk briefly and discuss what can be done to improve it?
Hey Irtza. i will look at this today (i.e download fresh repo and run it locally) to see if I can reproduce your errors.
Thanks. will be waiting to hear from you
Hey Irtza, i can't get to it today, but I will get to it this week (hopefully!)
Hi Dan, Did you check it out ? Can you give me a few pointers where could be the problem . I'll work on it for the next week
Okay, I am exactly recreating your output.
This issue is almost certainly in the _preprocess() function. I attempted to recreate MATLAB's resize function following http://stackoverflow.com/questions/29958670/how-to-use-matlabs-imresize-in-python
If you play with that function (removing cast to single precision, etc.) you change the output. I will try to figure out why the ordering of the operations that I chose are not producing the correct output.
NOTE: if you train the weights entirely in python, you won't have this issue, as there will be no conversion problem...
@mathDR I have been playing around with the _preprocess() function, removing the resizing, and cast to single precision etc. and (32,100) fixed size cropped images only. the results are still not good.
Did you test the weights after porting them to python? Is it worth retraining the model in python? since we will then be discarding the weights from the authors.
Irtza. Porting the weights from MATLAB was always to be a stop-gap measure. This was to ensure that the network in python was constructed correctly. The goal is absolutely to train in Python.
As for the results. I will look and see what my old code was doing. This is really weird, as I had the code "working" (in that it gave correct CHARNET results to the example images).
@mathDR Re-training to the model and let you know how it does on the Example Images.
@Irtza Thanks. If you get the training to work, can you do a PR with the python code (and steps you took) to train them? Ideally, you would list what hardware you used and the like.
I am testing this repo and I have the exact outputs that @Irtza mentioned. Have you found a solution to this ?
Not exactly related to this issue, but it lies along the same lines. I tried a few custom inputs and the results are pretty decent when it comes to alphabets but the model is not able to detect even a single number. Every number in the input gets detected as a character. Anyone has any idea how I can fix this?
Interesting. From what I understand the training corpus included digits, and the output of the logit function allows for digits.
This might call for a retraining on your corpus?
Yeah, I thought that might be a solution. But I am really new to all this, so I don't really know how to retrain. I tried deleting it completely and then redoing all your steps. It executed wonderfully but it got the same problem. So how can I train it again? You have given the link to the dataset website, which can a 10 gb dataset. Should I download that? But then how do I use the extract_charnet, make_keras and the use_charnet files? If you could point me into the right direction, it would be such an amazing help.
Hi, any progress? And any other solution to getting the digits in the results?
@rahul0302 The fundamental issue won't be resolved until retraining takes place. Basically (as stated above), the issue is that the matlab weights were ported to python to take advantage of the heavy lifting that was done to train the model.
The model is expecting cropped images and a specific preprocessing in Matlab (that I tried to emulate using python). The more stable thing going forward would be to implement the full training (using the network as built) on the original (massive!) training set.
By doing this, you won't need the keras "hack" of the non-symmetric custom padding.
If I ever get time to revisit this, I will implement the full python training of the non-hacked keras network, but right now I have no time to look at this :-(
@mathDR Hey Dan. I was wondering if I post a few images, would you mind running them on your computer locally and telling me what output you're getting? Because I have looked through the code so many times, it has the provisions to detect digits, but it isn't doing so. I have a few fonts that I used, which aren't working on my computer. Maybe they'll work on yours?
hey @rahul0302 yes, please post a few images and I will run them locally and let you know what I get. (I assume I will get the same issues, but let's make sure!)
Yeah, I am attaching a few of them. They are mostly the same numbers, just different fonts. I haven't actually cropped them exactly to 32 by 100. The "Cupertino" one gets recognised completely. I just attached that so you could test it out first.
Hey @rahul0302 I ran all of these images. I did all combinations of the preprocessing (using sklearn resize) that I could think of, and the various estimations changed pretty considerably.
Basically, what I am seeing (an oversimplification, I am sure) is that the posterior from the network is giving softmax probabilities that are all very close (so yielding an 'o' or a 'c' are pretty close, but 'o' is technically higher, so the network yields and 'o'). This is a case where taking a straight maximum of the softmax should be augmented with some type of confidence interval (maybe using Dropout as a variational approximation would help?)
Also, I notice for the pure digit based images:
0 = o
1 = l
2 = z
3 = e
5 = s
etc.
So I think that the training needs a LOT more pure digits and mixtures of letters and digits to make reasonable suggestions. Of course, this implies that a pure theano/python/keras training needs to be done to make this feasible (which I unfortunately don't have time for these days)
Hi, I also get the same issue. Lack of performance on the model.
exactly I want to know can we change the input size on neural net?
Or I must re-train the Matlab model using pure python to set the new input size neural nets?
hey @nullphantom I don't maintain this anymore. If you want to fork it and update the model, that would be lovely! Finding a NN that takes arbitrarily sized inputs would also be a major research achievement, so if you are looking to pursue that, please let me know how it goes!