deep-text-recognition-benchmark icon indicating copy to clipboard operation
deep-text-recognition-benchmark copied to clipboard

Which data is good to train?

Open kimlia545 opened this issue 4 years ago • 3 comments

Paper said "This result showed that the diversity of training data can be more important than the number of training examples, and that the effects of using different training datasets is more complex than simply concluding more is better." You uesd the MJSynth and SynthText in combination. I want to train Korean language data. Should I use data with various colors, fonts, backgrounds, widths, gradients, distortions, and blurs?

kimlia545 avatar Jan 08 '21 01:01 kimlia545

I think using rgb images does not help because of the network input has one channel default . However you can change it by opt.rgb=True.

yakhyo avatar Jan 12 '21 06:01 yakhyo

@yakhyo Thanks

kimlia545 avatar Feb 03 '21 06:02 kimlia545

Hi, @kimlia545, do you happen to have a pretrained model for Korean (or Korean + English) language that you can share? As their site only supports around 10 tests per day, I would like to have a separate model on premise. Thank you!

bit-scientist avatar Jun 21 '22 01:06 bit-scientist