jochre icon indicating copy to clipboard operation
jochre copied to clipboard

Latin text not rendered in OCRed text

Open mirjam-amsterdam opened this issue 5 years ago • 2 comments

https://ocr.yiddishbookcenter.org/contents?doc=nybc202767#page24

Latin text within the Yiddish text is not rendered, but there also is no placeholder indicating that some text is missing and that the reader should go to the original scan (before preparing an e-book, or before quoting etc.)

latin text missing 2 latin text missing 1

mirjam-amsterdam avatar Jun 04 '19 21:06 mirjam-amsterdam

Yes, I made the mistake in the early analyses to configure a "junk setting", which ignores text if the confidence score is too low. This means certain passages (typically other alphabets) are simply skipped. In the newer analyses this should no longer be the case. However, I'd rather wait for the new version of Jochre to fix this, as this version should be able to handle multiple alphabets.

urieli avatar Jun 12 '19 20:06 urieli

Stumbled over a misreading: when searching for מאַנש I get a result that actually is in Latin letters Wien ! Please, do make Latin letters searchable and show them as Latin letters in the text. And don't treat me with false results when I am looking for Mansch... Wien source of Mansch - Wien

mirjam-amsterdam avatar Aug 29 '22 06:08 mirjam-amsterdam