see
see copied to clipboard
Failed on load weights + Zero Accuracy problems
Hi @Bartzi
I already successfully train my custom data set (loss score below 0.01) with this command until last epoch:
python train_text_recognition.py mytrain/curriculum.json log \
--blank-label 0 \
--batch-size 64 \
--is-trainer-snapshot \
--use-dropout \
--char-map mytrain/ctc_char_map.json \
--gpu 0 \
--snapshot-interval 1000 \
--dropout-ratio 0.2 \
--epoch 200 \
-lr 0.0001
then i copy all result files from log to mytrain
folder.
But when i try specific npz model file with this command:
python text_recognition_demo.py mytrain model_42000.npz mytrain/image/00001.jpg mytrain/ctc_char_map.json --gpu 0
that command failed on load weights file
File "text_recognition_demo.py", line 158, in <module>
chainer.serializers.NpzDeserializer(f).load(network)
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/serializer.py", line 83, in load
obj.serialize(self)
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/link.py", line 954, in serialize
d[name].serialize(serializer[name])
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/link.py", line 954, in serialize
d[name].serialize(serializer[name])
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/link.py", line 612, in serialize
data = serializer(name, param.data)
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/serializers/npz.py", line 151, in __call__
value.set(numpy.asarray(dataset, dtype=value.dtype))
File "cupy/core/core.pyx", line 1696, in cupy.core.core.ndarray.set
File "cupy/core/core.pyx", line 1712, in cupy.core.core.ndarray.set
ValueError: Shape mismatch. Old shape: (52,), new shape: (72,)
FYI, before training process..i upgraded to chainer 4.2.0 for enabling cuDNN with cupy-cuda90 4.2.0. Is that a problem?
>>> import chainer
>>> chainer.cuda.available
True
>>> chainer.cuda.cudnn_enabled
True
Please help.
@rezha130 I think your problem is this line, you should exchange 52
by 72
. You char_map is different to the one I've been using. This problem could be fixed in the same way as done with PR #41.
@Bartzi thanks for quick reply, now i can test my model result.
But another problem appear:
python text_recognition_demo.py mytrain model_42000.npz mytrain/image/00001.jpg mytrain/ctc_char_map.json --gpu 0
give this result :
OrderedDict([('Numbers',
[OrderedDict([('top_left', (10.153651237487793, 0.0)),
('bottom_right', (188.42269897460938, 64.0))]),
OrderedDict([('top_left', (9.257012367248535, 0.0)),
('bottom_right', (188.95077514648438, 64.0))]),
OrderedDict([('top_left', (9.751701354980469, 0.0)),
('bottom_right', (189.06959533691406, 64.0))]),
OrderedDict([('top_left', (16.02237892150879, 0.0)),
('bottom_right', (188.70294189453125, 64.0))]),
OrderedDict([('top_left', (23.43842315673828, 0.0)),
('bottom_right', (188.17893981933594, 64.0))]),
OrderedDict([('top_left', (30.188858032226562, 0.0)),
('bottom_right', (187.6661376953125, 64.0))]),
OrderedDict([('top_left', (35.84349822998047, 0.0)),
('bottom_right', (187.2195281982422, 64.0))]),
OrderedDict([('top_left', (40.32756805419922, 0.0)),
('bottom_right', (186.85638427734375, 64.0))]),
OrderedDict([('top_left', (43.758575439453125, 0.0)),
('bottom_right', (186.5736083984375, 64.0))]),
OrderedDict([('top_left', (46.3254280090332, 0.0)),
('bottom_right', (186.35931396484375, 64.0))]),
OrderedDict([('top_left', (48.2197265625, 0.0)),
('bottom_right', (186.19967651367188, 64.0))]),
OrderedDict([('top_left', (49.60652542114258, 0.0)),
('bottom_right', (186.08200073242188, 64.0))]),
OrderedDict([('top_left', (50.614906311035156, 0.0)),
('bottom_right', (185.99632263183594, 64.0))]),
OrderedDict([('top_left', (51.347171783447266, 0.0)),
('bottom_right', (185.93399047851562, 64.0))]),
OrderedDict([('top_left', (51.879066467285156, 0.0)),
('bottom_right', (185.8885955810547, 64.0))])])])
I expect have more words rather than first word.
Did you check those 2 lines? And adjust them to your case?
Your groundtruth is not necessary for using the demo script, but it looks okay to me.
Your problem is that you are using a script that is designed for printing only one word.
I'm not 100% sure but I think that this line, could be the solution. Remove the [0]
.
@Bartzi
I remove [0]
and get this error:
Traceback (most recent call last):
File "text_recognition_demo.py", line 181, in <module>
word = "".join(map(lambda x: chr(char_map[str(x)]), word))
File "text_recognition_ktp.py", line 181, in <lambda>
word = "".join(map(lambda x: chr(char_map[str(x)]), word))
KeyError: '[33 28 30]'
@Bartzi Can we create word based ground thuth file? As @rezha130 has mentioned, till now I have been following the csv structure data, is not that the only way for showing ground thruth to the network?
@rezha130 I am really confused, can you tell me the step you followed to build your custom dataset? would be very grateful
If you use train_text_recognition
you can use word based ground truth file... oops yeah that is a little different to the other scripts... hmm I'm sorry for that...
some differences with time_step = 15
and max_char = 16
from my previous train are in these lines at my create_train.py
script :
max_bound_box = "15"
max_chars = "16"
for row in result:
image_name = row[0]
label = row[1]
file.write(os.path.join(train_dir,image_name)+"\t"+ label.replace(" ","\t") +"\n")
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz:()[];&+-/'.,0123456789"
without white space
I remove [0] and get this error:
Traceback (most recent call last): File "text_recognition_demo.py", line 181, in
word = "".join(map(lambda x: chr(char_map[str(x)]), word)) File "text_recognition_ktp.py", line 181, in word = "".join(map(lambda x: chr(char_map[str(x)]), word)) KeyError: '[33 28 30]'
You did expect this, didn't you? Once you remove [0]
you of course will get an iterable where there was none before... so you'll need to add a loop to the code.
Yes @Bartzi, you're right. But i'm pretty sure that model only predict the first word only and neglect all next sequence words (word based gt & tab delimited). It can be shown on rendered bbox images in log/boxes folder, the result is just one first word.
Since you said that train_text_recognition
script is designed for recognize only one word, so i try to adjust my custom ground truth files & train approaches with that constraint. Now, rendered bbox images show that model learnt to recognize all defined char_map in image --fyi, first word is like title for specific data values, it always repetitive in every train image set..so model can predict it easily--, but it looks line train process need longer epochs to improve. I can wait for that, since loss score tend to decrease slowly..
First thing I see is that the predicted bboxes don't look good at all. They should change positions after a while see the text recognition video from this file.
Furthermore, did you have a close look at the implementation of the dataset loader (here)?
Delimiting with tab
does not make sense. Sorry, if I misunderstood one of your posts regarding the layout of your groundtruth file.
If you struggle with the groundtruth format, you can also create your own dataset loader!
The only thing you need to make sure is that it returns the right data and is a subclass of the DatasetMixin
.
The expected return value is a tuple with the loaded image and the label converted from characters to classes, using the char_map. Remember to pad each word according to your maximum of characters per word.
OK @Bartzi . I think my ground truth file still not correct yet for multi words detector after i check TextRecFileDataset
.
Can you please send example of ground truth file that you use for the videos, ecspecially ground truth gt_word.csv
files --with example for how to write num_timesteps
, num_labels
, file_name
& labels
-- that you used in Text Recognition.mp4 (one word) and FSNS.mp4 (max two words & max three words) using TextRecFileDataset
function.
My case basically same with FSNS (detect 2 or 3 text region, than recognize chars in every detected bounding box)
Thank you
okay,
-
TextRecFileDataset
is not used for training FSNS data. - here are the first to lines of the text recognition gt_file:
23 1
/data/text_recognition/samples/9999/9999026_]kinkiness_-5_DonegalOne-Regular.jpeg ]kinkiness
- here is an FSNS example for training on 3 text regions
3 21
/mnt/ssd/christian/data/fsns/images/train/00000/0.png 67 12 11 1 5 26 20 21 23 0 0 0 0 0 0 0 0 0 0 0 0 23 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 73 11 7 5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
You should have a look at the FSNS examples, train_text_recognition
does some things differently!
Hi @Bartzi
Now i train with train_fsns.py but i got this error
ValueError: all the input array dimensions except for the concatenation axis must match exactly
What does it means?
hmm, hard to say without the stack trace.
But it basically says, that there are some arrays that are concatenated that do not have the correct shape. Could be because of your input data. Did you make sure to input an image that has this dimensions: 600x150
?
Hi @Bartzi
Input images size is not fixed in train data set. I am using same images data set when train using train_text_recognition
which didn't result this kind of error message, but it run until last epoch.
This is my ground truth file with FSNS style
2 16
mytrain/images/0001.jpg 13 11 12 0 0 0 0 0 0 0 0 0 0 0 0 0 4 2 8 3 1 4 7 4 1 3 9 10 1 1 1 7
and char_map.json
{
"0": 9250,
"1": 48,
"2": 49,
"3": 50,
"4": 51,
"5": 52,
"6": 53,
"7": 54,
"8": 55,
"9": 56,
"10": 57,
"11": 73,
"12": 75,
"13": 78
}
i train with this command
python train_fsns.py curriculum.json log \
--blank-label 0 \
--batch-size 32 \
--is-trainer-snapshot \
--use-dropout \
--char-map char_map.json \
--gpu 0 \
--snapshot-interval 1000 \
--dropout-ratio 0.2 \
--epoch 100
please check full stack trace of error below:
/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:150: UserWarning: optimizer.eps is changed to 1e-08 by MultiprocessParallelUpdater for new batch size.
format(optimizer.eps))
Exception in main training loop: all the input array dimensions except for the concatenation axis must match exactly
Traceback (most recent call last):
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 229, in update_core
batch = self.converter(batch, self._devices[0])
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/dataset/convert.py", line 133, in concat_examples
[example[i] for example in batch], padding[i])))
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/dataset/convert.py", line 163, in _concat_arrays
return xp.concatenate([array[None] for array in arrays])
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "train_fsns.py", line 292, in <module>
trainer.run()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 320, in run
six.reraise(*sys.exc_info())
File "/home/rezha/miniconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 229, in update_core
batch = self.converter(batch, self._devices[0])
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/dataset/convert.py", line 133, in concat_examples
[example[i] for example in batch], padding[i])))
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/dataset/convert.py", line 163, in _concat_arrays
return xp.concatenate([array[None] for array in arrays])
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Input images size is not fixed in train data set.
That does not work, because the network is not fully convolutional and because it is not possible to create a batch out of images with different size. It worked with train_text_recogntion.py
because there the input images are resized prior to being fed to the network.
The FSNS network expects the images to be of shape 600x150
if that is not the shape your data has, you have to adjust the data loading code (and also the network, as your data is likely to be very different to the original FSNS dataset)!
Ok @Bartzi you're right. I resized all my train images to 600x150
pixels.
But now i got IndexError: list index out of range
in calc_loss
:
/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:150: UserWarning: optimizer.eps is changed to 1e-08 by MultiprocessParallelUpdater for new batch size.
format(optimizer.eps))
Exception in main training loop: list index out of range
Traceback (most recent call last):
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 231, in update_core
loss = _calc_loss(self._master, batch)
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 262, in _calc_loss
return model(*in_arrays)
File "/home/rezha/dataloka/see/chainer/utils/multi_accuracy_classifier.py", line 45, in __call__
self.loss = self.lossfun(self.y, t)
File "/home/rezha/dataloka/see/chainer/metrics/loss_metrics.py", line 211, in calc_loss
overall_loss_weight = loss_weights[i - 1]
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "train_fsns.py", line 292, in <module>
trainer.run()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 320, in run
six.reraise(*sys.exc_info())
File "/home/rezha/miniconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 231, in update_core
loss = _calc_loss(self._master, batch)
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 262, in _calc_loss
return model(*in_arrays)
File "/home/rezha/dataloka/see/chainer/utils/multi_accuracy_classifier.py", line 45, in __call__
self.loss = self.lossfun(self.y, t)
File "/home/rezha/dataloka/see/chainer/metrics/loss_metrics.py", line 211, in calc_loss
overall_loss_weight = loss_weights[i - 1]
IndexError: list index out of range
What happen?
@rezha130 I think, this is becuase you have more than 3 timesteps in your training set?
Hi @mit456 thanks for helping
Yes, previous IndexError: list index out of range
in calc_loss
happen when i am using this ground truth file based on FSNS style
6 22
mytrain/images/01179.jpg 18 31 43 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 25 22 13 29 5 18 24 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Is there any maximum limitation of times step
& num_labels
in using FSNS-like experiment?
@Bartzi & @mit456
Can i used my custom my_char_map.json
for FSNS-like train in my custom train data set?Or i must used fsns_char_map.json
which is already provided?!
@rezha130 Before you resized your images to 600x150
, did you check that they have the same semantics as the images of the FSNS dataset? This is important!!
Forget about the loss_weights
in loss_metrics.py
they are not useful for your training. I just used them to make it possible to put some emphasis on certain timesteps of the optimization.
Technically there is no limit for num_timesteps
and num_labels
. You can of course use your custom char_map
, but you will need to adapt this line, and change the number of classes you want to distinguish.
After i add label_size
as parameter in self.classifier = L.Linear(None, label_size)
, model can be train.
num_timesteps = 2
num_labels = 16
main/accuracy = 0.5
until last epoch (100 epochs)
@Bartzi ..something strange in bounding box result. Whats happen?
Hi @Bartzi
As mention previously, I add label_size
as parameter in self.classifier = L.Linear(None, label_size)
, so model can be train. But if only num_timesteps
is 2 or 3!
I'm using this script to get label_size
with open(args.char_map, 'r') as fp:
char_map = json.load(fp)
label_size = len(char_map)
But if i try to train with another custom training data set which have num_timesteps
more than 3, i still got this error:
/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:150: UserWarning: optimizer.eps is changed to 1e-08 by MultiprocessParallelUpdater for new batch size.
format(optimizer.eps))
Exception in main training loop: list index out of range
Traceback (most recent call last):
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 231, in update_core
loss = _calc_loss(self._master, batch)
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 262, in _calc_loss
return model(*in_arrays)
File "/home/rezha/dataloka/see/chainer/utils/multi_accuracy_classifier.py", line 45, in __call__
self.loss = self.lossfun(self.y, t)
File "/home/rezha/dataloka/see/chainer/metrics/loss_metrics.py", line 211, in calc_loss
overall_loss_weight = loss_weights[i - 1]
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "train_fsns.py", line 292, in <module>
trainer.run()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 320, in run
six.reraise(*sys.exc_info())
File "/home/rezha/miniconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 231, in update_core
loss = _calc_loss(self._master, batch)
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 262, in _calc_loss
return model(*in_arrays)
File "/home/rezha/dataloka/see/chainer/utils/multi_accuracy_classifier.py", line 45, in __call__
self.loss = self.lossfun(self.y, t)
File "/home/rezha/dataloka/see/chainer/metrics/loss_metrics.py", line 211, in calc_loss
overall_loss_weight = loss_weights[i - 1]
IndexError: list index out of range
@mit456 do you have some same error experience with num_timesteps
more than 3?
@Bartzi please help..
I told you to delete the loss_weights
, this will fix your problem, but it still won't help you that much.
I ask again, did you have a look at the FSNS dataset? Did you see that there are always 4 views of the same street name sign that are shown at the same time? Your data does not have this property, so you can not use this script without modifications! The fact that your predictions already look quite good, is in my point of view a hint that the network memorizes your data and your data has view variations making it easy for the recognition network to memorize (i.e. overfit)
Ok @Bartzi would you please give me the list of py files that i need to modified? At least, i can focus on debugging some of your script, not all of your py files
First, you will need to change the network definition and the way predictions are made (1, 2). You will also need to change the way metrics/loss are calculated (1, 2). Furhtermore, you will need to think about, whether you want to use curriculum learning or not, and if you want to plot the current state of the network for each iteration (if you want to do this you might need to make changes in the bbox plotter to, or look whether there is one that is already able to work with your way of making predictions and your way of training).
OK @Bartzi
That 4 py file that i will try to modify: 2 files for network definition + 2 files for loss/metric calculation. I need to modify train
py script also for that.
Now for bbox plotter: What script on your code if i just want view on SINGLE image with max of 2 or 3 or more than 4 words/ timestep
? So i will got plot like this (screenshot from your video), NOT 4 views of the same street name sign that are shown at the same time (in FSNS images) :
Sounds good so far :sweat_smile:. You could have a look at all the bbox plotters here, you will see that all special classes inherit from the class BBOXPlotter
. A good example could be the SVHN BBOXPlotter.
Hi @Bartzi
If i try to set for calc_loss
loss_weights = [1, 1.25, 2, 1.25, 1, 1.25, 2, 1.25, 1, 1.25, 2, 1.25, 1, 1.25, 2, 1.25]
#16 initial losss weights for max 14 timestep
just specific for the longest semantic of my custom training dataset with max num_timesteps = 14
. Am i correct? How do you adjust loss_weights
values?
And one more thing, would you please explain what is different objective between image_size
& target_shape
in your train
script? Why for recognition network & BBox Plotter using target_shape
, but for loss metric calculation using image_size
?
Is it ok if i set it with same value? (also using image_size
for resizing image)
image_size = Size(width=200, height=40)
target_shape = Size(width=200, height=40)
Btw, for bbox plotter..i'm just using the basic one:
bbox_plotter_class = BBOXPlotter