crnn icon indicating copy to clipboard operation
crnn copied to clipboard

Is there a special training strategy(include pre-processing)?

Open peternara opened this issue 6 years ago • 1 comments

@bgshih Hi, I have implemented it using tensorflow based on your thesis and source . I've been doing a lot of testing and debugging, and I've even tried to increase the network structure. I used only synths trainset in the paper. So, I think it is the same as the network in your source and trainset. When using the III5K test, your paper did not perform as well. (perfect word match 70-72%) and my class is 63(a-z, A-Z, 0-9) Moreover, this performance is a case of subtracting a set of numerical tests. Trainsets(synths) are very imbalanced for numeric datasets. so, The results for numbers are not very good(i think~).

Finally, I thought about whether you have a different training special strategy(include pre-processing). Is not it? If yes, can you tell me?

thanks.

peternara avatar Nov 20 '17 01:11 peternara

Some notes about a simple preprocessing method: as far as I can remember, the CRNN approach to unify the input size is to stretch the image to a default size of 100x32 or something similar, which means stretching x and y axis by different factors to fit the target size.

I tested two other strategies: a. resize the image with the same factor for x and y such that it fits the target size and fill the rest of the target image by some suitable background colour. b. same as (a), but with randomization, e.g. when NN asks for new sample, calculate (a) but do random translations, random resizing, ...

Results on HTR dataset: CRNN: 8.2% CER a: 6.2% CER b: 5.7% CER (b) performs a little better than (a) because of the randomization (data augmentation), (a) performs much better than the CRNN approach.

githubharald avatar Jan 11 '18 10:01 githubharald