jochre icon indicating copy to clipboard operation
jochre copied to clipboard

Incomplete Word, Nothing to Correct - there should be a way to flag such cases

Open markhdavid opened this issue 1 year ago • 0 comments

There should be a way to flag that a word is beyond needing to be corrected in the narrow sense of a few characters being misread, and it's missing something completely.

Here's a case of an incomplete word to correct. It's on this page:

https://archive.org/details/doslidfundemyidi00rose/page/n72/mode/1up

This is the OCR for the same:

https://ocr.yiddishbookcenter.org/contents?doc=doslidfundemyidi00rose#page73

It shows a fragment of a word, supposedly עט, but this is in fact just the last two letters of the word. The entire word on the page is אַרבעט, but in the graphic that's shown for correction, only the last two letters appear. So how can this be corrected? There should be a way to flag this "word" as needing to be rescanned completely. It would no make no sense to correct the image of just ״עט״ to be ״אַרבעט״.

Here are images:

The word in context, with the actual entire word surrounded in red and the fragment mistaken for an entire word highlighted in gray: bad fragment in context - Screenshot 2023-12-27 at 9 20 00 AM

The correction dialog for this fragment of the word: bad fragment - Screenshot 2023-12-27 at 9 18 54 AM

OK, I see the instruction in the correction dialog

אױב אַ װאָרט איז שלעכט סעגמענטירט (ד“ה אױב נאָר אַ טײל פֿונעם װאָרט באַװײַזט זיך אױבן), טאָר מען עס נישט אױסבעסערן.

(translation: if a word is badly segmented, i.e., if only a part of the word shows up above, you must not correct it), but what are you supposed to do? There should be a way to flag such cases, so this stuff can get corrected. What's the plan?

markhdavid avatar Dec 27 '23 17:12 markhdavid