tesseract
tesseract copied to clipboard
Wrong character selected with multiple traindata files
I have two training data files - one english and one custom. When I process a particular document using either of these files, the data is read correctly. When I use them together (eng+cst), however, the number "6" is often read as "8". I added confidence debug code and saw that, when it works, tesseract is about 90% confident with the "6". When it is wrong, it is 99% confident that it is an "8" even though it is incorrect. Is there an issue when using multiple training data files? I am using version 3.3.0 of the wrapper (Tesseract 3.0.5) in a c# program with .net 3.5.
I found this article which sounds like the same problem. https://github.com/tesseract-ocr/tesseract/issues/1537
Don't use the api. Using a config file with classify_enable_learning 0 will work.