tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Wrong character selected with multiple traindata files

Open ilochray opened this issue 5 years ago • 2 comments

I have two training data files - one english and one custom. When I process a particular document using either of these files, the data is read correctly. When I use them together (eng+cst), however, the number "6" is often read as "8". I added confidence debug code and saw that, when it works, tesseract is about 90% confident with the "6". When it is wrong, it is 99% confident that it is an "8" even though it is incorrect. Is there an issue when using multiple training data files? I am using version 3.3.0 of the wrapper (Tesseract 3.0.5) in a c# program with .net 3.5.

ilochray avatar Jun 18 '19 12:06 ilochray

I found this article which sounds like the same problem. https://github.com/tesseract-ocr/tesseract/issues/1537

ilochray avatar Jun 21 '19 15:06 ilochray

Don't use the api. Using a config file with classify_enable_learning 0 will work.

jrbastien avatar Jun 21 '19 18:06 jrbastien