SimpleTesseractPythonWrapper icon indicating copy to clipboard operation
SimpleTesseractPythonWrapper copied to clipboard

character base tesseract

Open tanyapohn opened this issue 5 years ago • 0 comments

Hi again,

In this wrapper, I wonder why for some language besides English, Tesseract API with tessdata-best.traineddata gives the result in the format of character base not word base like English. For example:

Thai

69 confidence: 93.2952651977539 - [63, 74, 74, 85]; ห
70 confidence: 93.29107666015625 - [77, 74, 83, 85]; า
71 confidence: 93.30585479736328 - [75, 64, 100, 93]; ให
72 confidence: 93.0483627319336 - [101, 70, 105, 85]; ้
73 confidence: 93.2821044921875 - [111, 69, 116, 85]; ร

Eng

0 confidence: 96.37889099121094 - [358, 42, 443, 66]; FOCUS
1 confidence: 95.37885284423828 - [147, 263, 328, 294]; LEADERS
2 confidence: 95.37885284423828 - [341, 266, 653, 294]; CONCENTRATE
3 confidence: 90.43708801269531 - [116, 315, 506, 342]; SINGLE-MINDEDLY

Do you have any suggestions of setting configs in order to making the result in word base or text line base format?

image sample

tanyapohn avatar Jun 12 '19 07:06 tanyapohn