see why my accuracy is always 0?

Hi, I'm running the demo with pretrained model and following code python train_text_recognition.py ../small_dataset/curriculum.json ./logs --char-map ../small_dataset/ctc_char_map.json -g=0 -b=64 --model ../model/model_190000.npz --epochs 5 -li 10 --use-serial-iterator -lr 0.000001 --lr-step 0. Despite the small lr my accuracy is always 0, can anybody explain this to me? snapshot

Mar 01 '18 09:03 HFVladimir

Which pre-trained model are you using? Are you using the one from bartzi.de?

Mar 01 '18 09:03 Bartzi

Yes, I used a text_recognition_model from bartzi.de and only adapted the paths to pictures.

Mar 01 '18 09:03 HFVladimir

Did you have a look at the bboxes directory that has been created in the log directory for this train run? How do these images look like (there should be some images if you did not change anything else)?

Mar 01 '18 10:03 Bartzi

I looked at the pictures and how I see recognition gives nothing right from the start training. This was the first and last picture from boxes folder (10.png and 60.png in my case).

Result with picture from Synth-90k dataset passed with --test-image key:

I checked both images with this model using a script text_recognition_demo.py and all characters recognized correctly.

Mar 01 '18 11:03 HFVladimir

Oh now I see the problem^^

Try to use -r ../model/model_190000.npz instead of --model ../model/model_190000.npz

It should work then

Mar 01 '18 12:03 Bartzi

Yes, its works. Thank you. But I think you should remove the parameter model, because this is the first thing you pay attention to, but it does not do anything.

Now I've found a new way to get 0 accuracy^^ I've expanded the charmap with some characters ( in particular '%'), collected new dataset and I'm not happy with the accuracy localization part ( '% is divided into two boxes'). In the process of training my boxes almost do not move, and I decided that the parament --refinement will help me, but after it was turned on all the boxes were gone. What this parameter actually does and how I can merge two boxes into one when recognizing '%'?

Mar 02 '18 14:03 HFVladimir

You are right! I removed that parameter, thx for the hint.

The --refinement turns on transformation parameter refinement with inverse compositional spatial transformer networks. I thought this could be used to increase the accuracy of the localization network, by iteratively refining the predictions of the localization network. Turns out, it does help a little bit, but memory and runtime cost are to high. So I suggest, that you do not use this parameter, unless you want to try that. If you are using this parameter you should also set --refinement-steps 2.

But it is strange that all boxes were gone... did they disappear already in the first iteration, or did it take some time?

Mar 02 '18 15:03 Bartzi

I tried with default --refinement-steps 1 and they did disappear already in the first iteration. I'll try with --refinement-steps 2 and report the results. Now I have a few questions:

I'm trying to continue learning the model from bartzi.de with extended charset. Will it recognize the new symbols?
Can you explain me meaning of some parameters, in particular --zoom, --optimize-all-interval and factor parameters? Will they help me improve accuracy of the localization network?

Mar 02 '18 15:03 HFVladimir

Yes, you can use the model from bartzi.de, I suggest you start the training without initializing the recognition network with the pre-trained model, using the parameter --load-localization. The most important part is the localization net, if you are starting with a good init there, you will also get good results on the recognition net. Later you should do some finetuning with all parts of the network initialized by the model. That means that you can change the recognition part the way you like and then train using an already trained localization net.

Lets talk about parameters:

--zoom is a float in the range 0-1 and determines the zoom rate of the uninitialized localization network at the start of training (try to set it to 0.5 and 0.9 and have a look at the images in the bboxes folder for the first iteration)
--optimize-all-intervall is a switch that was originally used to test whether it makes sense to run optimize the parameters of the recognition network more often than the parameters of the localization network (turns out, it doesn't help). So this is another parameter that I should delete :sweat_smile:

other interesting parameters:

--send-bboxes turns on sending of bbox images. You can use this switch in conjunction with the show_progress.py script
--area-factor determines the strength of the area loss regularizer, you can play around with this and encourage the localization net to predict smaller bboxes
--area-scale-factor this factor is used to determine how the area loss factor is changing over time (if used by the loss metric), also a bit legacy, but could be useful
--aspect-factor as area factor the weighting factor for adding the aspect ratio loss regularization to the overall loss

Hope that helps =)

Mar 02 '18 16:03 Bartzi

Thanks, you helped a lot! If you dont mind i have one else question. I'm new to Chainer and I do not understand how replace last classifier layer on the pretrained model, cause i use new charmap and my label_size is now 57. I used the following code

with numpy.load('path-to-model') as f:
        for key,value in f.items():
            if str(key) == 'recognition_net/classifier/b' or str(key) == 'recognition_net/classifier/W':
                print('{} - {}'.format(key, len(str(value))))

to check the size of classifier layer and I expected to see 52 (default label_size), but i got recognition_net/classifier/b - 632 and recognition_net/classifier/W - 499. How do I remove the last layer and what should be the new dimension for my case? Thanks =)

Mar 05 '18 11:03 HFVladimir

Looks alright so far, but there is an error in your code snippet:

with

print('{} - {}'.format(key, len(str(value))))

you are not printing the length of the array, but the length of the stringified version of the array, so if you are doing:

print('{} - {}'.format(key, len(value)))

you should get 52 as result for b and also for W :wink:

Mar 05 '18 13:03 Bartzi

I encounter the same problem python train_text_recognition.py /home/dev2/see/datasets/vin/path.json /home/dev2/see/datasets/vin/logs --char-map ../datasets/fsns/fsns_char_map.json -b=64 -r ../datasets/model/model_190000.npz

btw : is this accuracy is tested on my new dataset? @Bartzi

Mar 11 '18 16:03 qnkhuat

@Bartzi Could somebody, please, provide a minimum example of flags to train text recognition on a new dataset and not get zero accuracies all the time? I am already tried various combinations and still do not get anything (despite the training loss decrease steadily, which is odd).

Mar 31 '18 14:03 stasbel

Yes I can give you an example:

python train_text_recognition.py <path to curriculum.json> -b 60 --blank-label 0 --char-map ../datasets/textrec/ctc_char_map.json --zoom 0.9 --area-factor 0.1 -lr 1e-4

and that should be it...

However, it might take some time until the accuracy goes up, although the loss is decreasing. It could also be that there is a bug in the accuracy calculation method.

Apr 03 '18 09:04 Bartzi