Amit Dovev comments

Results 538 comments of


                                            Amit Dovev

Add Javanese Script for jav-java

before training, he should try best/fast jav.traineddata.

Add Javanese Script for jav-java

Did you unpack jav from best/fast?

Add Javanese Script for jav-java

>Encoding of string failed! Failure bytes: ffffffea ffffffa6 ffffff83 ffffffea ffffffa6 ffffffa9 ffffffea ffffffa6 ffffffb2 ffffffea ffffffa6 ffffffb8 ffffffea ffffffa7 ffffff88 The text is clearly not encoded in utf-8.

Yiddish

https://en.wikipedia.org/wiki/Yiddish_orthography

Yiddish

https://en.wikipedia.org/wiki/Yiddish_orthography#Punctuation So, the 3 Punctuation marks I mentioned are used for Yiddish too. 05BE ‫־‬ HEBREW PUNCTUATION MAQAF 05F3 ‫׳‬ HEBREW PUNCTUATION GERESH 05F4 ‫״‬ HEBREW PUNCTUATION GERSHAYIM

Yiddish

I'm still not sure about the nikud for Yiddish. It seems that it does not use all the Hebrew nikud signs.

Yiddish

Actually, it seems that there are quite a lot of Hebrew words in Yiddish.

Correct handling of TM sign

https://github.com/tesseract-ocr/langdata/issues/59#issuecomment-294256255 theraysmith commented >Thanks for opening the new issues 62, 63. I will continue to think about the best approach. I tried to include TM in the current round of...

Correct handling of TM sign

https://github.com/tesseract-ocr/tesseract/commit/b0ead95d64a366

Always do serialization in little endian order

https://github.com/tesseract-ocr/tesseract/pull/1784#issue-342279018 >@stweil commented [on 18 Jul 2018 > >With the new API it will be very easy to switch to a fixed little endian file format as soon as it...