see
see copied to clipboard
Issue when predicting numbers with text detector pre-trained model
Hello,
I'm experimenting with the text detector for license plate reading. When using the provided pre-trained model, the predictions almost always guess the correct letters, but completely flop on the numbers (example below). I suppose it is because the mjsynth dataset is comprised of words and has none or next to no number examples. Is there any way to circumvent this problem? What is the best strategy? I suppose I could use a separate dataset for transfer learning with numbers, but I'm not sure of how this could be done with this model. Example image Result:
OrderedDict([('NJBTITZ', [OrderedDict([('bottom_right', (58.11630630493164, 64.0)), ('top_left', (0.0, 4.503129959106445))]), OrderedDict([('bottom_right', (79.9063949584961, 64.0)), ('top_left', (25.79473304748535, 3.4267578125))]), OrderedDict([('bottom_right', (101.3300552368164, 64.0)), ('top_left', (48.9509391784668, 3.6787776947021484))]), OrderedDict([('bottom_right', (124.90372467041016, 64.0)), ('top_left', (71.83382415771484, 2.9951610565185547))]), OrderedDict([('bottom_right', (147.37278747558594, 64.0)), ('top_left', (93.49320220947266, 2.8387718200683594))]), OrderedDict([('bottom_right', (171.1338348388672, 64.0)), ('top_left', (114.92671966552734, 2.347005844116211))]), OrderedDict([('bottom_right', (194.53089904785156, 64.0)), ('top_left', (136.35043334960938, 2.496673583984375))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (156.9724578857422, 4.3870086669921875))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (173.5450439453125, 7.3213653564453125))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (188.04563903808594, 12.494049072265625))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (198.81907653808594, 16.197669982910156))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (200.0, 18.550167083740234))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (200.0, 19.862049102783203))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (200.0, 20.653759002685547))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (200.0, 21.19257164001465))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (200.0, 21.57888412475586))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (200.0, 21.864521026611328))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (200.0, 22.08028793334961))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (200.0, 22.24560546875))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (200.0, 22.373645782470703))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (200.0, 22.47317886352539))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (200.0, 22.550762176513672))]), OrderedDict([('bottom_right', (200.0, 64.0)), ('top_left', (200.0, 22.611305236816406))])])])
Thank you in advance
EDIT: Ok, I found the --resume option. Sorry about that. Now, I have a couple of questions here (mainly due to the fact that I'm a beginner):
- I have an artificial dataset with upwards of 100k images. Would the performance be best using transfer learning on the provided model, or training a new one from scratch?
- What's the maximum/optimal batch size I should use with this network? Hardware-wise I can do with some pretty large batches, but as I understand, there's a point when large batch sizes hurt the predicted model.
Hi,
I think you are experiencing the problem with the numbers because of two reasons:
- the network has not been trained with words that contain many numbers, so it kind of overfitted to not predicting numbers.
- As we are using a LSTM for predicting the characters, the implicit language model in the LSTM is not used to predict numbers in such a way it would be necessary for a correct prediction on your data.
So basically what you can do is a retraining or fine-tuning, as you already consider. I think a fine-tuning makes more sense for your amount of data. Trainign from scratch might be quite difficult. Regarding the batch size: I'm not sure what the best/worst batch size is. I think going higher than 128 makes no sense and might also not work, because it used to much space. A very good number is always something around 32 and if possible do not go below 20, because that is bad for the usage of BatchNorm, but apart from that I can not really give you good advice, because we did not do any hyperparameter tuning on the batch size.
I hope that helps ;)
Hi,
I think you are experiencing the problem with the numbers because of two reasons:
- the network has not been trained with words that contain many numbers, so it kind of overfitted to not predicting numbers.
- As we are using a LSTM for predicting the characters, the implicit language model in the LSTM is not used to predict numbers in such a way it would be necessary for a correct prediction on your data.
So basically what you can do is a retraining or fine-tuning, as you already consider. I think a fine-tuning makes more sense for your amount of data. Trainign from scratch might be quite difficult. Regarding the batch size: I'm not sure what the best/worst batch size is. I think going higher than 128 makes no sense and might also not work, because it used to much space. A very good number is always something around 32 and if possible do not go below 20, because that is bad for the usage of BatchNorm, but apart from that I can not really give you good advice, because we did not do any hyperparameter tuning on the batch size.
I hope that helps ;)
Thank you so much. As an experiment I trained two models:
-
from scratch. 20k iterations until loss was around 0.15 and wouldn't go any lower, then more iterations with just the localization weights as you suggested on another post. This got me to around 90% accuracy but the model was still not very robust to image variations such as rotated examples.
-
transfer learning on the provided model for 15 epochs with lr of 1e-4 (around 20k iterations as well). This was much more effective indeed! the already trained weights could locate characters more accurately (with spatial variations as well) and the recognition network picked up the numbers pretty well too.
Now, I have found a curious problem:
Whenever detecting words with repeated characters (such as ABC-1222), the output of the demo script cuts out the repeated characters and leaves only the first one (the output of the given example would be 'ABC-12'). You can see a real example here:
OrderedDict([('KFW24', [OrderedDict([('top_left', (0.0, 14.073637008666992)), ('bottom_right', (49.4610710144043, 58.26472473144531))]), OrderedDict([('top_left', (14.820426940917969, 14.722162246704102)), ('bottom_right', (67.1063232421875, 57.84544372558594))]), OrderedDict([('top_left', (36.11481857299805, 15.273567199707031)), ('bottom_right', (88.75239562988281, 57.91472625732422))]), OrderedDict([('top_left', (57.102508544921875, 14.94267463684082)), ('bottom_right', (112.17118835449219, 58.878387451171875))]), OrderedDict([('top_left', (79.5019302368164, 13.340383529663086)), ('bottom_right', (132.2039794921875, 58.51348114013672))]), OrderedDict([('top_left', (103.36588287353516, 12.3841552734375)), ('bottom_right', (154.1340789794922, 58.5925178527832))]), OrderedDict([('top_left', (126.54853057861328, 12.413801193237305)), ('bottom_right', (176.0569305419922, 59.98551940917969))])])])
Nice that it worked that well =)
The problem you are experiencing right now is likely due to the fact that we use CTC-Loss for training the recognition network. This is a known problem with CTC because it tries to collapse multiple predictions of the same letter into one letter as long as there is no blank label prediction in between those predictions. You could try to circumvent this by using beam search decoding instead of greedy decoding, or by using independent softmax classifiers for each possible time step.
Hm, I understand now. As far as I know, Chainer has no native beam search, so that would be a little more demanding to implement. Can you give me any pointers on how would I go about changing the code to use independent classifiers? As I said, I'm a beginner so any light you could shed would be much appreciated.
Yes, chainer does not have a native beam search implementation, I also found this unfortunate already^^ Funny thing is, that I was apparently wrong in my last post. The code for training the text recognition model already uses independent softmax classifiers (see this line).
I think your problem lies in this line. In the text recognition demo it is assumed that you used CTC to train the model, but if you did not, the code will strip all repeated character occurences. So if you comment this line, it should work as expected.
As a nice experiment, you could try it with the CTC loss and check what difference it makes. (use this class).
You were right. Commenting the line and getting "classifications[0]" instead worked perfectly. Thank you! Finally, I'm trying to evaluate my model using the provided script, and while doing so I noticed that while the accuracy is quite satisfactory for optimal cases, the bounding boxes are not the best:
Are there steps I could take to improve bounding box accuracy/aspect ration in respect to my data?
Edit: To illustrate better why this is a problem The box can sometimes catch 2 characters at once and result in wrong predictions by recognizing the wrong one
Nice that it worked!
You are mentioning a problem that I also faced, so far I have not been able to resolve this in a pleasing manner. One of my first thoughts was to to use 'Inverse Compositional Spatial Transformers", but I abandoned the idea because it takes to long and to much memory, although it seems to work. This problem is still an open research question, maybe you can find a way?
@glefundes Are you using the train_text_recognition.py
script with the model_190000.npz
provided by Bartzi?
@glefundes Are you using the
train_text_recognition.py
script with themodel_190000.npz
provided by Bartzi?
Yes. I created my own dataset and used the provided model as a base for transfer learning.
@glefundes how you created ground truth csv file ?
@glefundes how you created ground truth csv file ?
I based mine on the one provided by the author. All you need a column for image paths and a column for ground truths. Dont forget to specify max number of characters/words in the first row as mentioned in the README (step 3 of training preparations).
You can write a simple python script using the csv package to do this automatically, by parsing whatever dataset/annotations you're using.
@Bartzi I don't suppose you've had any new ideas on how to handle the bounding box localization limitation we talked about previously? Just got back to the project I was implementing and I'm looking for a possible solution.
One thought I had was to handle this after the localization step but before the recognition, implementing some low-level image processing filter to refine the bounding boxes to whatever pattern I'm looking for (using CCA, histogram analysis or something like that to detect whether subregions were interesting or not) before passing them on to the recognition net. I don't know if it's possible since the networks are fused, but I'm a real newbie when it comes to chainer hahah.
@glefundes I have images in following format
and ground truth file like this:
1 1
IMG_0082.JPG OMRHRW2850KP06041
IMG_0089.JPG MRHRW2850KP060420
IMG_0090.JPG XMRHR XMRHRW2850K
IMG_0299.JPG MRHRW18TOKP083013
IMG_0304.JPG MRHRW MRHRW1870KP
IMG_0308.JPG MRHRW1870KP083918
IMG_0315.JPG MRHRW1870KP083921
IMG_0319.JPG MRHRW1870KP083921
IMG_0320.JPG MRHRW1870KP083923
IMG_0324.JPG RHRU5830KP0602090
IMG_0327.JPG MRHRU5830KP060210
IMG_0330.JPG INRHRU5S3OKP06020
is it correct ?
@glefundes I'm still thinking and working on a better way for that. I think your idea can only work if you can do this bbox adjustment in a differentiable way, such that you are able to backpropagate the gradients from the recognition network to the localization network. But you could also just use this image processing as a post processing step after you run the network. But I think the best way would be to have an additional network or something that helps you with refining the box proposal.
@harshalcse You GT file is not correct. Think about the following:
- Count the longest word you have in your dataset and remember that number (let
x
be that number) - Create the first line of your gt file like this: x
1
- You may ask why: I said in the README that the first line should provide the following information:
- The first number gives the number of text lines or words (in this case we will handle each character as an independent word, although this is not the case in reality)
- the second number gives the number of characters per word/line. As we use each character as its own word, we only have one character per word, hence we write
1
- The rest should be okay like this.
@glefundes
I have some samples as follows.
try to run following code:
python3 chainer/train_text_recognition.py /tmp/new/curriculum.json log --blank-label 0 --batch-size 16 --is-trainer-snapshot --use-dropout --char-map datasets/textrec/ctc_char_map.json --gpu 0 --snapshot-interval 1000
Error Stack :
Traceback (most recent call last):
File "chainer/train_text_recognition.py", line 252, in <module>
test_image = validation_dataset.get_example(0)[0]
File "/root/see-master/chainer/datasets/file_dataset.py", line 142, in get_example
labels = self.get_labels(self.labels[i])
File "/root/see-master/chainer/datasets/file_dataset.py", line 158, in get_labels
labels = [int(self.reverse_char_map[ord(character)]) for character in word]
File "/root/see-master/chainer/datasets/file_dataset.py", line 158, in <listcomp>
labels = [int(self.reverse_char_map[ord(character)]) for character in word]
KeyError: 77
please help
thanks
@harshalcse This has to do with the amount of different characters in your char map, and if I'm not mistaken, this has to do with it being different than what's expected by the code. I'm not 100% though. Please double check the char map you're using
@glefundes is right, your char_map
is not correct. It does not know which class to map to the character with the ASCII code 77
which is chr(77) == 'M'
. Please have a look at this explanation for more info about the char_map
.
@glefundes
I tried to train 795 images with following script
python3 chainer/train_text_recognition.py /tmp/small_dataset/curriculum.json log --blank-label 0 --batch-size 16 --is-trainer-snapshot --use-dropout --char-map /tmp/small_dataset/ctc_char_map.json --gpu 0 --snapshot-interval 1000
but .npz file is not generated in log directory.
please help to generate .npz file.
@harshalcse the .npz should be generated automatically at intervals defined by the --snapshot-interval
argument. I see you defined it as 1000. Please check to see if you are stopping your model earlier than that and try to let it run for longer or reduce the interval.
@glefundes what is meaning of --snapshot-interval ?
@harshalcse the flag --snapshot-interval
gives the interval in which a snaphot is taken. So if you set it to 1000,a snapshot of the current model will be created after 1000 train iterations.
right now trainer_snapshot
file created inside log directory but when .npz file created still not understood.
All I can say is that after snapshot_interval
iterations, you should get a snapshot of the model (see this line of code)
Right now training of dataset is done but when I used char_map.json because I want to train model only for alphanumeric characters only
{
"0": 9250,
"1": 48,
"2": 49,
"3": 50,
"4": 51,
"5": 52,
"6": 53,
"7": 54,
"8": 55,
"9": 56,
"10": 57,
"11": 45,
"12": 65,
"13": 66,
"14": 67,
"15": 68,
"16": 69,
"17": 70,
"18": 71,
"19": 72,
"20": 74,
"21": 75,
"22": 76,
"23": 77,
"24": 78,
"25": 80,
"26": 82,
"27": 83,
"28": 84,
"29": 85,
"30": 86,
"31": 87,
"32": 88,
"33": 89,
"34": 90
}
my gt_word.csv file look like this
17 1
/root/small_dataset_2/9999/0.JPG MRHDG1840KP033812
/root/small_dataset_2/9999/1.JPG MRHRW2840KP060067
/root/small_dataset_2/9999/2.JPG MRHDG1847KP033824
/root/small_dataset_2/9999/3.JPG MRHRW2850KP062158
/root/small_dataset_2/9999/5.JPG MRHDG1840KP032255
/root/small_dataset_2/9999/6.JPG MRHRW6830KP102532
/root/small_dataset_2/9999/7.JPG MRHRU5870KP101363
/root/small_dataset_2/9999/9.JPG MRHRU5850KP100742
/root/small_dataset_2/9999/10.JPG MRHRW1850KP081060
/root/small_dataset_2/9999/11.JPG MRHDG1845KP032378
but got following error
format(optimizer.eps))
Exception in main training loop: '35'
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/chainer/training/trainer.py", line 315, in run
update()
File "/usr/lib/python3.5/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
self.update_core()
File "/usr/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 235, in update_core
loss = _calc_loss(self._master, batch)
File "/usr/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 269, in _calc_loss
return model(*in_arrays)
File "/root/see-master/chainer/utils/multi_accuracy_classifier.py", line 48, in __call__
reported_accuracies = self.accfun(self.y, t)
File "/root/see-master/chainer/metrics/textrec_metrics.py", line 47, in calc_accuracy
word = "".join(map(self.label_to_char, word))
File "/root/see-master/chainer/metrics/loss_metrics.py", line 181, in label_to_char
return chr(self.char_map[str(label)])
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "chainer/train_text_recognition.py", line 299, in <module>
trainer.run()
File "/usr/lib/python3.5/site-packages/chainer/training/trainer.py", line 329, in run
six.reraise(*sys.exc_info())
File "/usr/lib/python3.5/site-packages/six.py", line 693, in reraise
raise value
File "/usr/lib/python3.5/site-packages/chainer/training/trainer.py", line 315, in run
update()
File "/usr/lib/python3.5/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
self.update_core()
File "/usr/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 235, in update_core
loss = _calc_loss(self._master, batch)
File "/usr/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 269, in _calc_loss
return model(*in_arrays)
File "/root/see-master/chainer/utils/multi_accuracy_classifier.py", line 48, in __call__
reported_accuracies = self.accfun(self.y, t)
File "/root/see-master/chainer/metrics/textrec_metrics.py", line 47, in calc_accuracy
word = "".join(map(self.label_to_char, word))
File "/root/see-master/chainer/metrics/loss_metrics.py", line 181, in label_to_char
return chr(self.char_map[str(label)])
KeyError: '35'
Please help me out in that issue .
At 100 epoch also same issue that Training curriculum has finished. Terminating the training process is coming
[[J55 5000 4.00388 0 9.96634e-09 3.96396 0 3.96532 0 $ total [##################################................] 69.23% this epoch [###################...............................] 38.25% 5000 iter, 55 epoch / 80 epochs 0.097178 iters/sec. Estimated time to finish: 6:21:10.341421. enlarging datasets Training curriculum has finished. Terminating the training process.
You can disable the code that causes this and you won't have that problem anymore...
@Bartzi @glefundes
Above issue is resolved using increasing batch size to 128
But how to identify that and I just want to train my model on alphanumeric but then also I want to train using ctc_char_map.json
from textrec . My modified char_map gives following error.
python3 chainer/train_text_recognition.py /root/small_dataset_51/curriculum.json log --blank-label 0 -b 128 --is-trainer-snapshot --char-map /root/small_dataset_51/ctc_char_map_new.json -g 0 -si 1000 -dr 0.2 -e 5 -lr 1e-8 --zoom 0.9 --area-factor 0.1
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 401, in fetch_batch
batch_ret[0] = [self.dataset[idx] for idx in indices]
File "/usr/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 401, in <listcomp>
batch_ret[0] = [self.dataset[idx] for idx in indices]
File "/usr/lib/python3.5/site-packages/chainer/dataset/dataset_mixin.py", line 67, in __getitem__
return self.get_example(index)
File "/root/see-master/chainer/datasets/file_dataset.py", line 144, in get_example
labels = self.get_labels(self.labels[i])
File "/root/see-master/chainer/datasets/file_dataset.py", line 163, in get_labels
labels = [int(self.reverse_char_map[ord(character)]) for character in word]
File "/root/see-master/chainer/datasets/file_dataset.py", line 163, in <listcomp>
labels = [int(self.reverse_char_map[ord(character)]) for character in word]
KeyError: 32
Exception in main training loop: 'NoneType' object is not iterable
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/chainer/training/trainer.py", line 315, in run
update()
File "/usr/lib/python3.5/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
self.update_core()
File "/usr/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 232, in update_core
batch = iterator.next()
File "/usr/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 148, in __next__
self.dataset_timeout)
File "/usr/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 417, in measure
self.mem_size = max(map(_measure, batch))
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "chainer/train_text_recognition.py", line 299, in <module>
trainer.run()
File "/usr/lib/python3.5/site-packages/chainer/training/trainer.py", line 329, in run
six.reraise(*sys.exc_info())
File "/usr/lib/python3.5/site-packages/six.py", line 693, in reraise
raise value
File "/usr/lib/python3.5/site-packages/chainer/training/trainer.py", line 315, in run
update()
File "/usr/lib/python3.5/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
self.update_core()
File "/usr/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 232, in update_core
batch = iterator.next()
File "/usr/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 148, in __next__
self.dataset_timeout)
File "/usr/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 417, in measure
self.mem_size = max(map(_measure, batch))
TypeError: 'NoneType' object is not iterable
please help
@harshalcse Your data points are very similar. Maybe try increasing the dropout ratio and with a batch size of 64. Also if it's possible, convert your labels to lowercase for fine-tuning so that you can directly use the pre-trained model and then convert them back to uppercase after the predictions.
@harshsp31 thanks for bumping this issue. I totally forgot to answer the last question :sweat_smile:
@harshalcse Please have a close look at the error you got. The first excption tells you that one of your words contains a character that converts to the ASCII code 32
. If you have a look at a code table, you will see that 32
is the code for the space character. And the space character is apparenlty not in your char_map
that is why it does not work. You have two options:
- add the space character and any other missing character to your
char_map
, or - delete this character from your annotations.
@Bartzi I have a small doubt. I know this is not the right place to ask this, but I didn't want to create a separate issue for this. How does the code divide the validation set from the train set if I give the same data folder as train and validation set in curriculum.json?
Yes, you are right this might not be the right place to ask this :wink:, but the answer is: it doesn't.