handwriting-ocr icon indicating copy to clipboard operation
handwriting-ocr copied to clipboard

Unable to train gapClassifier

Open PR-Iyyer opened this issue 6 years ago • 14 comments

I am getting the following error on trying to train gapClassifier. For your dataset also am getting this error.

Please help.

ValueError Traceback (most recent call last) in () 22 tmpCost = cost.eval(feed_dict={x: trainBatch, 23 targets: labelBatch, ---> 24 keep_prob: 1.0}) 25 print('tempcost=',tmpCost) 26 trainPlot.updateCost(tmpCost, i // COST_ITER)

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py in eval(self, feed_dict, session) 646 647 """ --> 648 return _eval_using_default_session(self, feed_dict, self.graph, session) 649 650

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py in _eval_using_default_session(tensors, feed_dict, graph, session) 4756 "the tensor's graph is different from the session's " 4757 "graph.") -> 4758 return session.run(tensors, feed_dict) 4759 4760

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata) 893 try: 894 result = self._run(None, fetches, feed_dict, options_ptr, --> 895 run_metadata_ptr) 896 if run_metadata: 897 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata) 1102 'Cannot feed value of shape %r for Tensor %r, ' 1103 'which has shape %r' -> 1104 % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) 1105 if not self.graph.is_feedable(subfeed_t): 1106 raise ValueError('Tensor %s may not be fed.' % subfeed_t)

ValueError: Cannot feed value of shape (64, 3600) for Tensor 'x:0', which has shape '(?, 7200)'

PR-Iyyer avatar Mar 09 '18 08:03 PR-Iyyer

Hi,

your code was probably set for usage of gapdata/large which uses large sizes of images (60x120 px). If you want to use smaller sizes, you have to change the size of input placeholder. Just change this line: x = tf.placeholder(tf.float32, [None, 7200], name='x') to x = tf.placeholder(tf.float32, [None, 3600], name='x') The images are flatten, so the resulting size is 60x60 = 3600.

Hope this helps, feel free to ask if there is anything else.

Breta01 avatar Mar 09 '18 10:03 Breta01

sure, thank you so much,.. Actually i got it fixed. Then i got dimension error. i tried changing it to 'reshape_images = tf.reshape(x, [-1, 32, 2, 1])` which helped me to start training now with your data.

But for my data, its giving such an error.

InvalidArgumentError: Input to reshape is a tensor with 32400 values, but the requested shape requires a multiple of 64 [[Node: Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_x_0_2, Reshape/shape)]]

PR-Iyyer avatar Mar 09 '18 11:03 PR-Iyyer

Ou, you reshaping it wrongly, at reshape to: tf.reshape(x, [-1, 60, 60, 1])

Breta01 avatar Mar 09 '18 11:03 Breta01

Actually I did it before and got the following error. Hence i tried 32,2.

InvalidArgumentError (see above for traceback): logits and labels must have the same first dimension, got logits shape [32,2] and labels shape [64] [[Node: sparse_softmax_cross_entropy_loss/xentropy/xentropy = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](add_3, _arg_Placeholder_0_0)]]

PR-Iyyer avatar Mar 09 '18 12:03 PR-Iyyer

Ok, I think I may know the problem, could you please share the code you are running (using gist or something similar.

Breta01 avatar Mar 09 '18 14:03 Breta01

I fix some bugs in loading images and added settings section. In settings you should be able to edit the size of slider and other variables.

If you want to use your own data, create a folder in data/gapdet/large/ where you place your images named as label_timestamp.jpg (label is 0 or 1). Images should be 60x120 px, the final crop is done by slider variable in code (height is fixed right now to 60px).

Breta01 avatar Mar 09 '18 17:03 Breta01

thank you so much.. Let me check it out and i shall update you at the earliest,

PR-Iyyer avatar Mar 10 '18 19:03 PR-Iyyer

how to prepare your own gap classifier data?

annish07 avatar Jun 21 '18 11:06 annish07

First, it depends on what gap classifier do you want to train. I would recommend training the GapClassifier-BiRNN.ipynb because it gives the best accuracy. For training of this model you need data provided in words2 folder. This folder contains images along with text files (with same name) which contains array of positions (x coordinates) of vertical lines separating letters.

To extend this folder you can use WordClassDM.py script where you specify data folder as folder containing raw word images. The script loads and normalize the images and then shows them, then you can manually (click and drag) place lines to position where should be letters separated. The lines with image are then saved by pressing s key.

Breta01 avatar Jun 28 '18 12:06 Breta01

So basically gap classifier will predict where my gap is .. right?? if it is then why are you using slider in there??

On Thu, 28 Jun 2018 at 6:19 PM Břetislav Hájek [email protected] wrote:

First, it depends on what gap classifier do you want to train. I would recommend training the GapClassifier-BiRNN.ipynb because it gives the best accuracy. For training of this model you need data provided in words2 folder. This folder contains images along with text files (with same name) which contains array of positions (x coordinates) of vertical lines separating letters.

To extend this folder you can use WordClassDM.py script where you specify data folder as folder containing raw word images. The script loads and normalize the images and then shows them, then you can manually (click and drag) place lines to position where should be letters separated. The lines with image are then saved by pressing s key.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Breta01/handwriting-ocr/issues/16#issuecomment-401023018, or mute the thread https://github.com/notifications/unsubscribe-auth/AKa5BVLBsRKK0i5pSaEwriPzBpBQ3lC5ks5uBNDxgaJpZM4Sj7lQ .

annish07 avatar Jun 28 '18 13:06 annish07

It is because I am not predicting the array of x-coordinates, but I am predicting whether or not there is gap on the slide. I think it is more efficient than predicting the the array, but you can try it the other way.

Right now, I am feeding an array of images (slides) into the classifier and I use slider to extract these images from word image. These slides (patches) are overlapping and are processed by CNN before they are feed into the RNN which evaluates each of the slides if there is or isn't the gap. If you want you can replace it by CNN network which will extract these slides (patches) or tf.extract_image_patches function. But you would have to change the code a bit more to predict the array of x-coordinates.

Breta01 avatar Jun 28 '18 13:06 Breta01

Thank you for the insights .

On Thu, 28 Jun 2018 at 7:01 PM Břetislav Hájek [email protected] wrote:

It is because I am not predicting the array of x-coordinates, but I am predicting whether or not there is gap on the slide. I think it is more efficient than predicting the the array, but you can try it the other way.

Right now, I am feeding an array of images (slides) into the classifier and I use slider to extract these images from word image. These slides (patches) are overlapping and are processed by CNN before they are feed into the RNN which evaluates each of the slides if there is or isn't the gap. If you want you can replace it by CNN network which will extract these slides (patches) or tf.extract_image_patches function. But you would have to change the code a bit more to predict the array of x-coordinates.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Breta01/handwriting-ocr/issues/16#issuecomment-401035354, or mute the thread https://github.com/notifications/unsubscribe-auth/AKa5Bf4WvlHuohfR6I5B0j_mwhSuFLd3ks5uBNqegaJpZM4Sj7lQ .

annish07 avatar Jun 28 '18 13:06 annish07

what is this doing?? Can you please breif it in GAPClassifier -biRNN.

  ind = indices[i] + (-(offset % 2) * offset // 2) + ((1 - offset%2) *

offset // 2)

Avinash

On Thu, Jun 28, 2018 at 7:17 PM, Avinash Kumar [email protected] wrote:

Thank you for the insights .

On Thu, 28 Jun 2018 at 7:01 PM Břetislav Hájek [email protected] wrote:

It is because I am not predicting the array of x-coordinates, but I am predicting whether or not there is gap on the slide. I think it is more efficient than predicting the the array, but you can try it the other way.

Right now, I am feeding an array of images (slides) into the classifier and I use slider to extract these images from word image. These slides (patches) are overlapping and are processed by CNN before they are feed into the RNN which evaluates each of the slides if there is or isn't the gap. If you want you can replace it by CNN network which will extract these slides (patches) or tf.extract_image_patches function. But you would have to change the code a bit more to predict the array of x-coordinates.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Breta01/handwriting-ocr/issues/16#issuecomment-401035354, or mute the thread https://github.com/notifications/unsubscribe-auth/AKa5Bf4WvlHuohfR6I5B0j_mwhSuFLd3ks5uBNqegaJpZM4Sj7lQ .

annish07 avatar Jul 10 '18 10:07 annish07

Yes, it looks a little bit strange, but first in line: targets_seq[i] = np.ones((length[i])) * NEG Target sequence is same length as image sequence and represents label for each image in sequence. targets_seq is initialized with zeros, so I have to calculate indexes of positive labels and change those to ones as you can see in line: targets_seq[i][ind] = POS In the line you referring to I was experimenting with making more positive labels around ground truth label.

For example, if you specify gap_span = 3 (3 positive labels for each ground truth label). The indices[i] stores indexes of ground truth labels. In first iteration of loop offset is 0, so the indices are unchanged. In second, offset is 1, so to each ground truth indices is added -1. In third, offset is 2, so to each ground truth indices is added 1 (for higher gap_span it continues as -2, 2, -3, 3... and so on). The trick to notice is that -1 // 2 == -1 (not zero)

Breta01 avatar Jul 12 '18 15:07 Breta01