keras-ocr
keras-ocr copied to clipboard
Question about training with empty labels
Hello everyone,
Is it possible to explicitly train the model to recognize 'empty' images?
I have a use-case where the speed of the model is relevant and we already know where to look in the image for text. So we only use the recognizer (and not the detector). However, sometimes (maybe 1%) there is no text in that crop. At the moment, the model mostly generates either empty output (correct) or some random output (incorrect).
I would like to explicitly train the Recognizer with image samples of 'empty' images/words, so that it better learns that it should output an empty string instead of wrong output. However, at the moment, the training does not allow samples with an empty string as ground truth. When I add it to the training I get the error:
/usr/local/lib/python3.6/dist-packages/keras_ocr/recognition.py in get_batch_generator(self, image_generator, batch_size, lowercase)
373 for c in ''.join(sentences):
374 assert c in self.alphabet, 'Found illegal character: {}'.format(c)
--> 375 assert all(sentences), 'Found a zero length sentence.'
376 assert all(
377 len(sentence) <= max_string_length
AssertionError: Found a zero length sentence.
Does anyone know another way to train with empty labels? Thanks in advance!
One naive way is to send it to the detector and see if it picks up any text and if doesn't you can assume it is an empty image. Though this might help you avoid training a new model, it will add some inference time.
Hello everyone,
Is it possible to explicitly train the model to recognize 'empty' images?
I have a use-case where the speed of the model is relevant and we already know where to look in the image for text. So we only use the recognizer (and not the detector). However, sometimes (maybe 1%) there is no text in that crop. At the moment, the model mostly generates either empty output (correct) or some random output (incorrect).
I would like to explicitly train the Recognizer with image samples of 'empty' images/words, so that it better learns that it should output an empty string instead of wrong output. However, at the moment, the training does not allow samples with an empty string as ground truth. When I add it to the training I get the error:
/usr/local/lib/python3.6/dist-packages/keras_ocr/recognition.py in get_batch_generator(self, image_generator, batch_size, lowercase) 373 for c in ''.join(sentences): 374 assert c in self.alphabet, 'Found illegal character: {}'.format(c) --> 375 assert all(sentences), 'Found a zero length sentence.' 376 assert all( 377 len(sentence) <= max_string_length AssertionError: Found a zero length sentence.
Does anyone know another way to train with empty labels? Thanks in advance!
One possibility is instead of labeling the empty image with an empty string, you can use another special character to represent 'empty', and have the model learn on it.