handwriting-ocr
handwriting-ocr copied to clipboard
How do i create csv files for my own dataset. I have my own Images(All capitals). i tried to use word_labeling notebook. it threw me this error , " word_normalization() got an unexpected keyword argument 'hystNorm'". How should i proceed
Hi,
can you better specify what data do you have. I you have images with labels already, you don't need to use the word_labeling
notebook. Instead, you should write script which process those images, so that each image is named as 'label_randomtimestamp.png'. Such images can then be processed using data_normalization.py
and data_create_sets.py
. I will try to write some documentation how to add own images.
Also can you specify where exactly did you get this error? The naming changes, so now the argument name is hyst_norm
.
TODO
- [ ] Write documentation
- [ ] Rework word_labeling notebook and naming
The data i'am working with is a printed text in all capital format and not hand-written. I have cropped the main image into multiple smaller images so that each individual smaller image has just a word from main image. Now i'am trying to create .csv files for all the image and train the model.But to be specific please guide me through these problems:
- When i run create_csv.py over the train, test and dev folders, 3 csv files got created but they do not have any data except just the headers( i.e label, shape, image, gaplines) 2)plus i studied the csv files you are using it contain contains 0 as its first entry and digits for each letter in the words. Can you please guide me through this , Thank you.
If you are working on printed text recognition, you may have better results with some different OCR project. For example https://github.com/tesseract-ocr/tesseract. If you would still want to use this project, you have to first label the words. Manual labelling of printed text is wast of time, do it artificially or use some dataset out there. Then you would have to normalize the words, split them into sets, and finally create CSV files.