handwriting-ocr How do i create csv files for my own dataset. I have my own Images(All capitals). i tried to use word_labeling notebook. it threw me this error , " word_normalization() got an unexpected keyword argument 'hystNorm'". How should i proceed

How do i create csv files for my own dataset. I have my own Images(All capitals). i tried to use word_labeling notebook. it threw me this error , " word_normalization() got an unexpected keyword argument 'hystNorm'". How should i proceed

Open Aditichintamani opened this issue 6 years ago • 4 comments

Feb 20 '19 06:02 Aditichintamani

Hi, can you better specify what data do you have. I you have images with labels already, you don't need to use the word_labeling notebook. Instead, you should write script which process those images, so that each image is named as 'label_randomtimestamp.png'. Such images can then be processed using data_normalization.py and data_create_sets.py. I will try to write some documentation how to add own images.

Also can you specify where exactly did you get this error? The naming changes, so now the argument name is hyst_norm.

Feb 24 '19 15:02 Breta01

TODO

[ ] Write documentation
[ ] Rework word_labeling notebook and naming

Feb 24 '19 15:02 Breta01

The data i'am working with is a printed text in all capital format and not hand-written. I have cropped the main image into multiple smaller images so that each individual smaller image has just a word from main image. Now i'am trying to create .csv files for all the image and train the model.But to be specific please guide me through these problems:

When i run create_csv.py over the train, test and dev folders, 3 csv files got created but they do not have any data except just the headers( i.e label, shape, image, gaplines) 2)plus i studied the csv files you are using it contain contains 0 as its first entry and digits for each letter in the words. Can you please guide me through this , Thank you.

Feb 28 '19 06:02 Aditichintamani

If you are working on printed text recognition, you may have better results with some different OCR project. For example https://github.com/tesseract-ocr/tesseract. If you would still want to use this project, you have to first label the words. Manual labelling of printed text is wast of time, do it artificially or use some dataset out there. Then you would have to normalize the words, split them into sets, and finally create CSV files.

Mar 02 '19 18:03 Breta01

handwriting-ocr handwriting-ocr copied to clipboard

How do i create csv files for my own dataset. I have my own Images(All capitals). i tried to use word_labeling notebook. it threw me this error , " word_normalization() got an unexpected keyword argument 'hystNorm'". How should i proceed

TODO

handwriting-ocr
handwriting-ocr copied to clipboard