warp-ctc icon indicating copy to clipboard operation
warp-ctc copied to clipboard

CTC layer producing infinite losses

Open Deepayan137 opened this issue 6 years ago • 12 comments

Hello, I am trying to train an OCR which takes a binarized image of a sentence from a document image and tries to predict the output. The losses more often than not always become infinite after running for a certain number of epochs.

Epochs:[1]/[25]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 215/215 [00:07<00:00, 29.54it/s]
train loss (min, avg, max): (-0.276, nan, 1152.610)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 91/91 [00:01<00:00, 67.98it/s]
validation loss (min, avg, max): (inf, nan, -inf)
Epochs:[2]/[25]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 215/215 [00:06<00:00, 31.24it/s]
train loss (min, avg, max): (inf, nan, -inf)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 91/91 [00:01<00:00, 69.84it/s]
validation loss (min, avg, max): (0.000, nan, 0.000)
Epochs:[3]/[25]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 215/215 [00:06<00:00, 33.08it/s]
train loss (min, avg, max): (1.000, nan, 1.000)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 91/91 [00:01<00:00, 70.82it/s]
validation loss (min, avg, max): (inf, nan, -inf)

I am not sure where the error is and would be very grateful if some one could point me in the right direction.

Deepayan137 avatar Mar 09 '18 06:03 Deepayan137

Hi SeanNaren, I am having the same issue of infinite losses. Do you have any idea on this problem?

anonymous2792 avatar Mar 20 '18 09:03 anonymous2792

I had this problem until I did the following:

  • We have the need for variable-sized sequences. To handle this, we pad with zeros.
  • Since the labels need to be a one-dimensional array with the entire sequences of each item in the batch , I removed the zero-padding from all of the labels and just concatenated them into one long sequence.
  • I then no longer counted the zero-padded entries when calculating the label lengths.
  • I had to switch my Optimizer from SGD to Adam. I'm new enough to Pytorch, that I don't know if the SGD optimizer ever worked, but in my Keras implementation (I'm doing a bake-off), I found that SGD did better in generalization than Adam did.

I'm using python 3.6 and pytorch 0.3.1.

engrean avatar Apr 17 '18 13:04 engrean

@engrean .whether we must remove all zeros in 1D label tensor ? for example, give a batch_size=2, T=3 label = [[1,0],[2,2]], we must remove zero, and change it to [1,2,2] ?

kobenaxie avatar Apr 19 '18 10:04 kobenaxie

Assuming your zeros mean the _blank that warp-ctc expects and you ended up padding the end of your arrays with zeros, then yes, I removed all trailing zeros and concatenated the non-zero elements into a single array. And then your label lengths need to be [1, 2]

engrean avatar Apr 19 '18 16:04 engrean

OK, thank you so much~

kobenaxie avatar Apr 20 '18 04:04 kobenaxie

Hello everyone, i just wanted to ask, i have trained my OCR model on 4850 training photo, with variable sequences of characters with their ground truths i had the inf loss problem and solved it by making the unit step window (the input image width) = twice the maximum length of my sequence, so now i get high loss values like 45 and 46 for both training and validation losses, also if i have my sequence of 9 characters, and my maximum length is 30, i put blanks in the rest of 21 places,

is this a lack of data? or the blank padding is the thing that cause that, if yes, then Kindly explain the solution above in my case because i dont get it?

AhmedKhaled945 avatar Sep 14 '18 19:09 AhmedKhaled945

As far as I understood inf values appear when it is impossible to align sequences, so a probability which is calculated by CTC will be 0. And neglog of it is inf! I had this problem when there was an example in my dataset with the length a bit less than output of network. So, including the necessity to add blanks between double symbols, its length turned out to be more than maximal possible length. Removing this sample helped. In second task where I use CTC I do not have such examples, however my loss become inf after 30k iterations. I do not know why exactly it is so, but I will try to tackle this via just not doing optimizer.step() if loss.value == np.inf.

lebionick avatar Aug 23 '19 16:08 lebionick

Thanks for the reply, in my case it was a problem of image dimensions vs sequence length

the image width should be at least 2n-1 if the sequence length is N, this how the CTC works.

On Fri, 23 Aug 2019 18:07 Nikolay Maslovich [email protected] wrote:

As far as I understood inf values appear when it is impossible to align sequences, so a probability which is calculated by CTC will be 0. And neglog of it is inf! I had this problem when there was an example in my dataset with the length a bit less than output of network. So, including the necessity to add blanks between double symbols, its length turned out to be more than maximal possible length. Removing this sample helped. In second task where I use CTC I do not have such examples, however my loss become inf after 30k iterations. I do not know why exactly it is so, but I will try to tackle this via just not doing optimizer.step() if loss.value == np.inf.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SeanNaren/warp-ctc/issues/29?email_source=notifications&email_token=AIZSAOS747LDR45PLEQ7363QGADNLA5CNFSM4EUPI5A2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5AUTIQ#issuecomment-524372386, or mute the thread https://github.com/notifications/unsubscribe-auth/AIZSAORYY6LP64VLKMOWE3TQGADNLANCNFSM4EUPI5AQ .

AhmedKhaled945 avatar Aug 23 '19 16:08 AhmedKhaled945

@AhmedKhaled945 Ie You mean that number of time-steps of RNN (If I am using standard OCR approach: CNN + LSTM) should be 2 * n - 1, where n -- maximal length of sequence (text) in my data?

lebionick avatar Aug 23 '19 16:08 lebionick

yes as a minimum, if i want to detect a sequence with max length 30 for example, then input width should at least, be 61, can be more.

On Fri, 23 Aug 2019 18:26 Nikolay Maslovich [email protected] wrote:

@AhmedKhaled945 https://github.com/AhmedKhaled945 Ie You mean that number of time-steps of RNN (If I am using classical OCR approach) should be 2 * n - 1, where n -- maximal length of sequence (text) in my data?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SeanNaren/warp-ctc/issues/29?email_source=notifications&email_token=AIZSAOX5FJCQKZCC4Z7JYEDQGAFULA5CNFSM4EUPI5A2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5AWCLY#issuecomment-524378415, or mute the thread https://github.com/notifications/unsubscribe-auth/AIZSAOR7WIDPHIKPWU67U2TQGAFULANCNFSM4EUPI5AQ .

AhmedKhaled945 avatar Aug 23 '19 16:08 AhmedKhaled945

@AhmedKhaled945 Thank you

lebionick avatar Aug 23 '19 16:08 lebionick

I'm not an expert in the ctc loss function, but my interpretation of this article suggests that the n in 2n -1 refers to the number of consecutive repetitions in the target. So, generally: model_output > label_length and if there are consecutive repetitions in the label, then model_output > label_length + 2n - 1 where n is the number of consecutive repetitions in the label.

dzubke avatar Aug 26 '20 22:08 dzubke