VietnameseOCR
VietnameseOCR copied to clipboard
How to add more characters like chinese and korean characters?
hi @jetjodh , sorry for late reply you.
I suggest that you should use this repos https://github.com/miendinh/vi.word2img for generating training dataset for Chinese and Korean characters with output dimension is 28 x 28 pixels, after that, you should update this file https://github.com/miendinh/VietnameseOCR/blob/master/data/vi.characters.csv and place your dataset in this folder https://github.com/miendinh/VietnameseOCR/tree/master/data/train/characters To understand the organization of VietnameseOCR dataset, you could read file https://github.com/miendinh/VietnameseOCR/blob/master/generate_dataset.py and apply if for Chinese or Korean characters.
I cannot understand the above steps to add more characters to your VietnameseOCR. I have a dataset of Gujarati characters same as yours (i.e 28x28 pixel values then Label of that character) also, I updated your 'vi.characters.csv' file with my Gujarati characters and then tried to train the model. But I am getting so many errors while running it. Can you please help me to make it possible?
hi @royalk2c,
I used the quite old version of TensorFlow (TF) I recommend you should try the latest version once, it was implemented Keras interface, so it makes life easier.
You can use API related to data augmentation of TF for improving the accuracy https://www.tensorflow.org/tutorials/images/data_augmentation
also, you can create more training data by this https://github.com/miendinh/vi.word2img
Good luck!