pyocr icon indicating copy to clipboard operation
pyocr copied to clipboard

Confidence score

Open MathieuCliche opened this issue 8 years ago • 6 comments

Is it possible to get a confidence score for the predictions (not orientation) ?

MathieuCliche avatar Mar 24 '17 18:03 MathieuCliche

You mean one confidence score for the OCR on the whole image ? I'm not even sure whether Tesseract provides such score.

jflesch avatar Mar 25 '17 02:03 jflesch

Yeah, for the whole image, or "per words". From what I read, tit's possible to get it from the hocr or tsv output. You can check it our here : https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#tsv-output-currently-available-in-305-dev-in-master-branch-on-github

For example, the TSV output has a column "conf", which gives the confidence for each word.

MathieuCliche avatar Mar 25 '17 02:03 MathieuCliche

Ok, good to know. For the words, I guess it can be added as an attribute to pyocr.builders.Box objects. Regarding the whole, with the current API, it's going to be a little more complicated ...

jflesch avatar Mar 25 '17 19:03 jflesch

Per words, you can say thanks to @a-pagano : https://github.com/openpaperwork/pyocr/pull/86 :-)

jflesch avatar Nov 30 '17 15:11 jflesch

Sorry, I meant to keep this ticket opened regarding the confidence score for the whole page.

jflesch avatar Nov 30 '17 15:11 jflesch

Changes of @a-pagano have been released in Pyocr 0.5

jflesch avatar Dec 14 '17 21:12 jflesch