CNN-for-handwritten-kanji icon indicating copy to clipboard operation
CNN-for-handwritten-kanji copied to clipboard

What dataset were these networks trained on?

Open livnev opened this issue 8 years ago • 5 comments

What dataset were these networks trained on? There is no mention in the readme/scripts what dataset you used to achieve the results that you are demonstrating. I am curious: is there a publicly available handwritten kanji dataset?

livnev avatar Oct 24 '16 22:10 livnev

Sorry, the dataset is not publicly available. But you can see some samples in the Dataset-Augmentation folder. So if you have images that look like that, the model should work on them.

KyotoSunshine avatar Nov 30 '16 08:11 KyotoSunshine

Is there a public list of the kanji characters that are contained within the dataset used to train the model?

huangwaylon avatar Jan 11 '17 20:01 huangwaylon

I am afraid there is not, but there were about 500 characters if I remember correctly. So rather than using the pretrained model, I would suggest you just train this model on your own data.

KyotoSunshine avatar Jan 12 '17 01:01 KyotoSunshine

I see, thank you for the information!

huangwaylon avatar Jan 12 '17 01:01 huangwaylon

@huangwaylon You can use dataset from this site: http://etlcdb.db.aist.go.jp/

r3m4k3 avatar Dec 31 '18 02:12 r3m4k3