Amit Dovev
Amit Dovev
>1. image -> Tesseract preprocessing and binarization -> intermediate image -> Tesseract recognition -> text result 1 Tesseract first does layout analysis and then does text recognition on the text...
You can try the Japanese.traineddata from the best or fast repo.
https://github.com/tesseract-ocr/tessdata_fast/tree/master/script https://github.com/tesseract-ocr/tessdata_best/tree/master/script
>It is unclear who invented the name frk for Frankish. Maybe it should be renamed. frk is the ISO 639-3 code for Frankish.
https://github.com/tesseract-ocr/tesseract/wiki/Data-Files FYI, the source of the 'Language' column in the tables is the old google code download page. Ray uploaded the official traineddata files to that old page, Zdenko added...
It seems frk is trained using modern German corpus and a small number of fonts.
@stweil, maybe you want to close this issue?
Is 'frk' only for German Fraktur?
The default psm when using the comman line is 6 (Auto layout analysis). The default psm when using the C++ API is 3 (Single Block).
Everyone who tests the new best traineddata should also update Tesseract to the latest commit.