VietnameseOCR How to add more characters like chinese and korean characters?

How to add more characters like chinese and korean characters?

Open jetjodh opened this issue 6 years ago • 3 comments

Aug 07 '18 10:08 jetjodh

hi @jetjodh , sorry for late reply you.

I suggest that you should use this repos https://github.com/miendinh/vi.word2img for generating training dataset for Chinese and Korean characters with output dimension is 28 x 28 pixels, after that, you should update this file https://github.com/miendinh/VietnameseOCR/blob/master/data/vi.characters.csv and place your dataset in this folder https://github.com/miendinh/VietnameseOCR/tree/master/data/train/characters To understand the organization of VietnameseOCR dataset, you could read file https://github.com/miendinh/VietnameseOCR/blob/master/generate_dataset.py and apply if for Chinese or Korean characters.

Aug 16 '18 02:08 miendinh

I cannot understand the above steps to add more characters to your VietnameseOCR. I have a dataset of Gujarati characters same as yours (i.e 28x28 pixel values then Label of that character) also, I updated your 'vi.characters.csv' file with my Gujarati characters and then tried to train the model. But I am getting so many errors while running it. Can you please help me to make it possible?

Mar 26 '20 10:03 kirtanc25

hi @royalk2c,

I used the quite old version of TensorFlow (TF) I recommend you should try the latest version once, it was implemented Keras interface, so it makes life easier.

You can use API related to data augmentation of TF for improving the accuracy https://www.tensorflow.org/tutorials/images/data_augmentation

also, you can create more training data by this https://github.com/miendinh/vi.word2img

Good luck!

Apr 28 '20 04:04 miendinh

VietnameseOCR VietnameseOCR copied to clipboard

How to add more characters like chinese and korean characters?

VietnameseOCR
VietnameseOCR copied to clipboard