Amit Dovev
Amit Dovev
Returning back to your question. >In all these cases, tesseract gets a poor result. In case 1, the diacritics are in the truth text, and Tesseract gets them badly wrong....
To summarize the above: (For OCR) I think (1) is more preferable than (2). But both aren't good. So, unless you can make the nikud recognition much better, IMO a...
heb.wordlist contains these words: >אַרטיקל- קאַװע־שטיבל בלאַט: נאַװיגאציע, אַר־עס־עס באַניצער װאָס קאַטעגאָריע אינהאַלט באַהאַלטן נאָך אַלע צוזאַמענארבעט אַרטיקל רעדאַקטירן דאָס אַריבערשליסונגען אָקטאָבער אַנאָנימע נאָר באַנוצערס אַלץ האָט [בעאַרבעטן] זאָל קאָנטאַקט...
The Hebrew word list and training text should not contain the Yiddish digraphs: 05F0 װ HEBREW LIGATURE YIDDISH DOUBLE VAV 05F1 ױ HEBREW LIGATURE YIDDISH VAV YOD 05F2 ײ HEBREW...
Ray, How many fonts do you use for Hebrew training?
Here are two examples from Project Ben-Yehuda: Poem: http://benyehuda.org/bialik/bia060.html Prose: http://benyehuda.org/yalag/yalag_086.html
@theraysmith If you have further questions about this subject, I'll be happy to answer them.
>I have noticed that there is an older style of font, in which the letters are very rounded instead of rather square. Something like these samples: http://fontbit.co.il/search.asp?tag=3&style=13 ? That's the...
https://fonts.google.com/?subset=hebrew A list of Hebrew fonts from the Open Siddur Project http://opensiddur.org/tools/fonts/
>Prefix and suffixes. Do they just append to the base word without changing it (like cat->cats) or do they potentially change the word slightly (like lady->ladies)? If they never do...