crnn icon indicating copy to clipboard operation
crnn copied to clipboard

Pretrained model training set contains IIIT5k images too?

Open NightFury13 opened this issue 7 years ago • 11 comments

I ran the demo codes using the pretrained model and I seem to be getting around 86% word-level accuracy on the IIIT5k dataset whereas the paper suggests ~80% accuracy. Were there any other pre/post processing steps involved in training the pretrained-model provided?

NightFury13 avatar Jan 10 '17 06:01 NightFury13

@NightFury13 The released model should deliver close results to the ones reported in the paper. Our best result did not reach 86% on IIIT5k. We did not include IIIT5k training data either.

bgshih avatar Jan 10 '17 16:01 bgshih

Hi @bgshih ,

I read your paper so there are some points I wasn't clear on.

  1. While training do I have to use cropped words or I can use the natural scene images where text is available to train these images. I have this doubt because I tested the demo model given here on natural scene images which wasn't able to recognise multiple words or even a single word.

  2. My research area lies in identifying non- dictionary words from raw images taken from mobile vans and use it to update open street maps, so mostly I will be encountering names of shops and interesting places. To tackle this problem, I thought of using py-faster-rcnn to first identify the words from the image and then train the cropped words using your model. But reading through your paper I felt there isn't any need of object detection 'words' and your model can actually be trained on natural images.

  3. Finally, does image captioning a similar kind of problem, only difference being it doesn't contain and text. So, overall sense I got is that features of the word are learnt which helps in distinguishing between individual letters so helps in identifying even the lexicon free words.

It would be helpful to know, if my interpretation about your model is correct and a bit of guidance would be appreciated.

Thanks

rremani avatar Jan 23 '17 07:01 rremani

@rremani 1.2. CRNN is only for cropped words. For whole images with much more background, a text detection method is required to detect text first. If you feed whole images directly into CRNN, the text would be buried by background noise. 3. I think it is possible to employ an image captioning model to this problem. But I am concerned with the order you set for the output words -- For image captioning problem the words are naturally ordered by grammar. But for text detection you need to define that order meaningfully.

bgshih avatar Jan 24 '17 19:01 bgshih

@bgshih How about including a Region Proposal Network inside CRNN that can make it end to end trainable ? Is training such a model harder ? Tried both separately which had nice results as expected. Thanks for your valuable suggestions.

rremani avatar Jan 25 '17 07:01 rremani

@rremani Sorry but I am not sure if that will work -- worth a try, I think.

bgshih avatar Jan 25 '17 18:01 bgshih

@bgshih Thanks a lot for your suggestions!

rremani avatar Jan 26 '17 06:01 rremani

@rremani I am working on the problem of OCR in the wild (text detection + recognition on natural images). As @bgshih suggested that we need a text detection module on top of CRNN, which would extract the text segments and that would be the input to CRNN. Could you please explain how did you use Region Proposal Network inside CRNN to make it end to end trainable? Please Suggest.

rayush7 avatar Apr 18 '17 11:04 rayush7

Hi @rayush7, I haven't tried an end to end model yet, used SSD for extraction and CRNN for recognition.

rremani avatar Apr 18 '17 14:04 rremani

FYI, we have recently released another project for text detection https://github.com/MhLiao/TextBoxes.

bgshih avatar Apr 18 '17 16:04 bgshih

@bgshih @rremani Thanks guys. Will surely look into Textboxes.

rayush7 avatar Apr 18 '17 17:04 rayush7

Hi @bgshih

does your TextBoxes https://github.com/MhLiao/TextBoxes work on invoices ?

such as : image_0032

Thank you

ahmedmazari-dhatim avatar May 16 '17 08:05 ahmedmazari-dhatim