why my accuracy is always 0?
Hi, I'm running the demo with pretrained model and following code
python train_text_recognition.py ../small_dataset/curriculum.json ./logs --char-map ../small_dataset/ctc_char_map.json -g=0 -b=64 --model ../model/model_190000.npz --epochs 5 -li 10 --use-serial-iterator -lr 0.000001 --lr-step 0. Despite the small lr my accuracy is always 0, can anybody explain this to me?

Which pre-trained model are you using? Are you using the one from bartzi.de?
Yes, I used a text_recognition_model from bartzi.de and only adapted the paths to pictures.
Did you have a look at the bboxes directory that has been created in the log directory for this train run? How do these images look like (there should be some images if you did not change anything else)?
I looked at the pictures and how I see recognition gives nothing right from the start training.
This was the first and last picture from boxes folder (10.png and 60.png in my case).
Result with picture from Synth-90k dataset passed with --test-image key:

I checked both images with this model using a script text_recognition_demo.py and all characters recognized correctly.
Oh now I see the problem^^
Try to use -r ../model/model_190000.npz instead of --model ../model/model_190000.npz
It should work then
Yes, its works. Thank you. But I think you should remove the parameter model, because this is the first thing you pay attention to, but it does not do anything.
Now I've found a new way to get 0 accuracy^^
I've expanded the charmap with some characters ( in particular '%'), collected new dataset and I'm not happy with the accuracy localization part ( '% is divided into two boxes'). In the process of training my boxes almost do not move, and I decided that the parament --refinement will help me, but after it was turned on all the boxes were gone. What this parameter actually does and how I can merge two boxes into one when recognizing '%'?
You are right! I removed that parameter, thx for the hint.
The --refinement turns on transformation parameter refinement with inverse compositional spatial transformer networks.
I thought this could be used to increase the accuracy of the localization network, by iteratively refining the predictions of the localization network. Turns out, it does help a little bit, but memory and runtime cost are to high. So I suggest, that you do not use this parameter, unless you want to try that. If you are using this parameter you should also set --refinement-steps 2.
But it is strange that all boxes were gone... did they disappear already in the first iteration, or did it take some time?
I tried with default --refinement-steps 1 and they did disappear already in the first iteration. I'll try with --refinement-steps 2 and report the results. Now I have a few questions:
- I'm trying to continue learning the model from
bartzi.dewith extended charset. Will it recognize the new symbols? - Can you explain me meaning of some parameters, in particular
--zoom,--optimize-all-intervaland factor parameters? Will they help me improve accuracy of the localization network?
Yes, you can use the model from bartzi.de, I suggest you start the training without initializing the recognition network with the pre-trained model, using the parameter --load-localization. The most important part is the localization net, if you are starting with a good init there, you will also get good results on the recognition net. Later you should do some finetuning with all parts of the network initialized by the model. That means that you can change the recognition part the way you like and then train using an already trained localization net.
Lets talk about parameters:
-
--zoomis a float in the range 0-1 and determines the zoom rate of the uninitialized localization network at the start of training (try to set it to0.5and0.9and have a look at the images in thebboxesfolder for the first iteration) -
--optimize-all-intervallis a switch that was originally used to test whether it makes sense to run optimize the parameters of the recognition network more often than the parameters of the localization network (turns out, it doesn't help). So this is another parameter that I should delete :sweat_smile:
other interesting parameters:
-
--send-bboxesturns on sending of bbox images. You can use this switch in conjunction with theshow_progress.pyscript -
--area-factordetermines the strength of the area loss regularizer, you can play around with this and encourage the localization net to predict smaller bboxes -
--area-scale-factorthis factor is used to determine how the area loss factor is changing over time (if used by the loss metric), also a bit legacy, but could be useful -
--aspect-factoras area factor the weighting factor for adding the aspect ratio loss regularization to the overall loss
Hope that helps =)
Thanks, you helped a lot! If you dont mind i have one else question. I'm new to Chainer and I do not understand how replace last classifier layer on the pretrained model, cause i use new charmap and my label_size is now 57. I used the following code
with numpy.load('path-to-model') as f:
for key,value in f.items():
if str(key) == 'recognition_net/classifier/b' or str(key) == 'recognition_net/classifier/W':
print('{} - {}'.format(key, len(str(value))))
to check the size of classifier layer and I expected to see 52 (default label_size), but i got
recognition_net/classifier/b - 632 and recognition_net/classifier/W - 499. How do I remove the last layer and what should be the new dimension for my case? Thanks =)
Looks alright so far, but there is an error in your code snippet:
with
print('{} - {}'.format(key, len(str(value))))
you are not printing the length of the array, but the length of the stringified version of the array, so if you are doing:
print('{} - {}'.format(key, len(value)))
you should get 52 as result for b and also for W :wink:
I encounter the same problem python train_text_recognition.py /home/dev2/see/datasets/vin/path.json /home/dev2/see/datasets/vin/logs --char-map ../datasets/fsns/fsns_char_map.json -b=64 -r ../datasets/model/model_190000.npz
btw : is this accuracy is tested on my new dataset? @Bartzi
@Bartzi Could somebody, please, provide a minimum example of flags to train text recognition on a new dataset and not get zero accuracies all the time? I am already tried various combinations and still do not get anything (despite the training loss decrease steadily, which is odd).
Yes I can give you an example:
python train_text_recognition.py <path to curriculum.json> -b 60 --blank-label 0 --char-map ../datasets/textrec/ctc_char_map.json --zoom 0.9 --area-factor 0.1 -lr 1e-4
and that should be it...
However, it might take some time until the accuracy goes up, although the loss is decreasing. It could also be that there is a bug in the accuracy calculation method.