caffe_ocr icon indicating copy to clipboard operation
caffe_ocr copied to clipboard

can you share the script used for generating training dataset ?

Open argman opened this issue 7 years ago • 4 comments

Really nice project btw!

the reason bi-lstm does not affect accuracy because lstm is more likely to model the language model(especially works for English words), but in Chinese, it depends on how you generate your data.

argman avatar Oct 28 '17 15:10 argman

Yes, you are right, attention-based encoder-decoder should be better than lstm+ctc when modeling the language model. Generating Chinese dataset is more complicated than you think, but I will share my simplified code soon.

senlinuc avatar Oct 29 '17 14:10 senlinuc

大神,很像知道你的训练数据如何将背景与字符进行合成,拉伸等操作的?能提供下脚本拜读么?

zhousteven avatar Nov 07 '17 08:11 zhousteven

什么时候开放数据生成代码 @senlinuc

blacklee5 avatar Feb 03 '18 08:02 blacklee5

想测试一下效果编译都不过谁能给发个编译好的 [email protected]

nmwhqjl avatar Apr 10 '18 06:04 nmwhqjl