Amit Dovev comments

Results 538 comments of


                                            Amit Dovev

OCR of binarized image does not lead to same result as OCR of input image

>1. image -> Tesseract preprocessing and binarization -> intermediate image -> Tesseract recognition -> text result 1 Tesseract first does layout analysis and then does text recognition on the text...

require an update of japanese trained data

You can try the Japanese.traineddata from the best or fast repo.

require an update of japanese trained data

https://github.com/tesseract-ocr/tessdata_fast/tree/master/script https://github.com/tesseract-ocr/tessdata_best/tree/master/script

Duplicate and incomplete data for German fraktur

>It is unclear who invented the name frk for Frankish. Maybe it should be renamed. frk is the ISO 639-3 code for Frankish.

Duplicate and incomplete data for German fraktur

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files FYI, the source of the 'Language' column in the tables is the old google code download page. Ray uploaded the official traineddata files to that old page, Zdenko added...

Duplicate and incomplete data for German fraktur

It seems frk is trained using modern German corpus and a small number of fonts.

Duplicate and incomplete data for German fraktur

@stweil, maybe you want to close this issue?

Duplicate and incomplete data for German fraktur

Is 'frk' only for German Fraktur?

Different result using tesserocr and command line version

The default psm when using the comman line is 6 (Auto layout analysis). The default psm when using the C++ API is 3 (Single Block).

Best Traineddata Feedback - Persian

Everyone who tests the new best traineddata should also update Tesseract to the latest commit.