tessdata_shreetest icon indicating copy to clipboard operation
tessdata_shreetest copied to clipboard

Accounting\currency version

Open oldominion opened this issue 6 years ago • 4 comments

Could you possibly add a traindata file specialized for accounting purposes? 1-9, dot, comma, various currency symbols such as '$£€', dash, colon/semicolon, etc

€ is the main problem for me, it's invariably detected as a 6 or an 8 instead of being ignored, and since I'm looking for digits only, I have no way of correcting the output via post processing.

oldominion avatar Jun 03 '19 23:06 oldominion

Please see https://github.com/tesseract-ocr/tessdata/pull/120

Shreeshrii avatar Jun 04 '19 02:06 Shreeshrii

Thanks! Sadly that one doesn't include currency symbols, so it's no use at avoiding the frequent misclassification of € for example.

oldominion avatar Jun 04 '19 21:06 oldominion

If you can make a training text with the kind of symbols you need, I can run the training.

See samples of training text used for other traineddata:

https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.training_text https://github.com/Shreeshrii/tessdata_shreetest/blob/master/engrestrict.training_text

Shreeshrii avatar Jun 05 '19 05:06 Shreeshrii

@Shreeshrii is there any way that you can help me make a traineddata for single image (it is basically a check box) for my project

samrood121 avatar Jul 16 '21 06:07 samrood121