Amit Dovev
Amit Dovev
before training, he should try best/fast jav.traineddata.
Did you unpack jav from best/fast?
>Encoding of string failed! Failure bytes: ffffffea ffffffa6 ffffff83 ffffffea ffffffa6 ffffffa9 ffffffea ffffffa6 ffffffb2 ffffffea ffffffa6 ffffffb8 ffffffea ffffffa7 ffffff88 The text is clearly not encoded in utf-8.
https://en.wikipedia.org/wiki/Yiddish_orthography
https://en.wikipedia.org/wiki/Yiddish_orthography#Punctuation So, the 3 Punctuation marks I mentioned are used for Yiddish too. 05BE ־ HEBREW PUNCTUATION MAQAF 05F3 ׳ HEBREW PUNCTUATION GERESH 05F4 ״ HEBREW PUNCTUATION GERSHAYIM
I'm still not sure about the nikud for Yiddish. It seems that it does not use all the Hebrew nikud signs.
Actually, it seems that there are quite a lot of Hebrew words in Yiddish.
https://github.com/tesseract-ocr/langdata/issues/59#issuecomment-294256255 theraysmith commented >Thanks for opening the new issues 62, 63. I will continue to think about the best approach. I tried to include TM in the current round of...
https://github.com/tesseract-ocr/tesseract/commit/b0ead95d64a366
https://github.com/tesseract-ocr/tesseract/pull/1784#issue-342279018 >@stweil commented [on 18 Jul 2018 > >With the new API it will be very easy to switch to a fixed little endian file format as soon as it...