chinese-subtitle-ocr icon indicating copy to clipboard operation
chinese-subtitle-ocr copied to clipboard

关于images.csv的问题

Open qimingfeijin opened this issue 6 years ago • 5 comments
trafficstars

运行download_images.py报错,错误提示为No such file or directory: 'images.csv',请问我该怎么解决

qimingfeijin avatar Feb 18 '19 12:02 qimingfeijin

Hi,

instead of download_images.py, just use the COCO dataset. It is much smaller and for OCR you actually don't need so many images. You can directly download 5K images here: http://images.cocodataset.org/zips/val2017.zip. Then you don't need download_images.py

Hope this helps.

lars76 avatar Feb 19 '19 19:02 lars76

感谢你的帮助与分享。我想做中文的文本检测,需要一些中文的图片训练和测试,请问你的中文数据集是在哪里下载的?

qimingfeijin avatar Feb 20 '19 02:02 qimingfeijin

I generated the dataset myself by using a subtitle file (srt) and then doing manual annotation. I don't think that there are any datasets that you can download.

Most papers actually generate their own training/test images by creating random text on images. Look at this github project https://github.com/JarveeLee/SynthText_Chinese_version and the corresponding paper is described here https://blog.csdn.net/u010167269/article/details/52389676. I tried something similar myself and it produced equal or better results than a real dataset.

lars76 avatar Feb 20 '19 22:02 lars76

我明白了,谢谢你的分享

qimingfeijin avatar Feb 21 '19 01:02 qimingfeijin

@lars76 can you share your method for synthesise dataset?

wushilian avatar May 06 '19 06:05 wushilian