CNN_LSTM_CTC_Tensorflow
CNN_LSTM_CTC_Tensorflow copied to clipboard
Some problems aboult this code
Infact the "max_stepsize" in this code should't be 64.The "max_stepsize" is equal to 12,which is shrunk from original "image_width"(180) to 180/2/2/2/2 = 12.Remenber the core idea in CRNN+CTC is that we split the image vertically to many slices,and we predict each slice's classes,finally using CTC to decode the predicted sequence to the respectd result.For example "aaa_bb_c_"and "a__b_ccc" both respect to the same label "abc",you can also read the paper for more details.
But when I run the wrong code in author's dataset,and I got 98% accuracy while I got a bad result in VGGWord dataset.Finally I got a good result after changing the code.
So, why this code work in your situation,I am very courious about this.Thank you.
@980044579 , thanks for sharing your observations and experience.
- With the great source codes in this project and the data provided, I was able to reproduce the author's result, getting 0.997 at 50th epoch.
- I agree with you on the max_stepsize. it should be in the direction of "image_width", 12 in this project. I also plan to correct this and see how it might impact the final result., If it's okay, can you share your code changes in this area?
Just change the code between CNN -> RNN in cnn_lstm_otc_ocr.py, make sure the shape of the input of RNN is [batch_size, max_stepsize, num_features].
Hi @980044579 , thanks a lot for your kind reply. I did the code changes too in yesterday and found the model can achieve 0.999 accuracy at 12th epoch. so the model is able to converge faster and achieve better performance after fixing this bug.
For those who are interested, here is my code changes.
Good job~
I am getting and error Failed precondition: sequence_length(0) <= 12
What I did for inference is I have already trained the model to
model_checkpoint_path: "ocr-model-21001" all_model_checkpoint_paths: "ocr-model-21001"
on a set of 80000 train and 20 val images a provided in the dataset. I took a few images from val set and create a folder infer(40imgs named 1.png .. 40.png). I tried to run the code for inference using the command given in the readme.
INFO:tensorflow:Restoring parameters from ./checkpoint/ocr-model-20001 restore from ckpt./checkpoint/ocr-model-20001 2018-01-23 11:16:17.305360: W tensorflow/core/framework/op_kernel.cc:1192] Failed precondition: sequence_length(0) <= 12 Traceback (most recent call last): File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call return fn(*args) File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn status, run_metadata) File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.FailedPreconditionError: sequence_length(0) <= 12 [[Node: CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lstm/transpose_2, _arg_lstm/Fill_0_1)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./main.py", line 184, in
Caused by op 'CTCBeamSearchDecoder', defined at:
File "./main.py", line 184, in
FailedPreconditionError (see above for traceback): sequence_length(0) <= 12 [[Node: CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lstm/transpose_2, _arg_lstm/Fill_0_1)]]
@anubhavrohatgi make sure the maxlength of label in your dataset must <= max_stepsize
@980044579 Please brief me a bit, quiet new to this stuff in Python. what maxlength of label is.
Currently I am using the dataset that was provided in the link given in the repo. Max_stepsize = 64, i guess as is stated in utils.py
All images are 180x60.
error occurs somewhere here: dense_decoded_code = sess.run(model.dense_decoded, feed)
below is my infer folder contents
are you talking about the labels.txt?
Correct me if I am wrong here:: by infer we mean we are testing on our real time data. is it. If not please help me, how can I use the model to predict the values of a given input image.
@anubhavrohatgi @980044579 ,hello, i run into the same question,but i inspect the label and find the max length of label is not greater than maxT in[maxT,batch_size,num_char],have you solve it? i don't konw how to do it
@anubhavrohatgi @kstys make sure you understand how the framework "CNN + RNN + CTC" work and there are some bugs in this code.You should not only change the "maxsteps" in utils.py but also the code between CNN ——> RNN in cnn_lstm_otc_ocr.py
I have a question. in the file of cnn_letm_otc_ocr.oy , after cnn, the x.set_shape([FLAGS.batch_size, filters[3], 24]) is right? the time sequence should be the width which will be feed to the LSTM, but the code is the length of channels.
I changed the code as @LevinJ ,but i got a error "tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found."
I set the max_step as 128 and my input image is 32*192