jochre
jochre copied to clipboard
Java Optical CHaracter Recognition
probably for version 3.0: nybc210852 has some difficult characters. Waw-Yud isn't well recognised (screenshot) but worse: this YIVO-pulication has the special character for double-waw (screenshot). The OCR-program will need special...
How do I force line-breaks? There seems to be a threshold for hard-breaks to be added, how do I make each line break in the source convert to a \n?
I've tried to install Jochre on Ubuntu Linux, following the [official Installation instructions](https://github.com/urieli/jochre/wiki/Installation), but it always fails. I tried it on both `dev` and `master` branches. I'll attach logs of...
There should be a way to flag that a word is beyond needing to be corrected in the narrow sense of a few characters being misread, and it's missing something...
compare search יםקאטער yields 4 reslults and both ים־קאטער and ים-קאטער yield 65. My guess is that something is different with the encoding of the hyphen/dash in these cases. could...
Bumps [ch.qos.logback:logback-classic](https://github.com/qos-ch/logback) from 1.2.4 to 1.3.12. Commits 0df4ec1 prepare release 1.3.12 189af50 ensure JDK 8 compatibility 14a71d0 cater for array size marked with -1 b8eac23 prevent DOS attacks using on...
Probably caused by the ornaments. But the table isn't correct either. E.g. the numbering is missing. https://ocr.yiddishbookcenter.org/contents?doc=nybc212765
I wanted to correct a word. The opening parethesis is seen as a letter. But sincethe closing is not part of the word, I fear that I might do more...
searched for שעהן נעשטאַלט in nybc212765 tried to add a colon (:) at the end of the word זשיטניצקי - it has זשיטניצקיז when I press the word for correction...
I just went over a section of nybc200407, p. 11f. I noticed that some words were reproduced in a standardized form, whereas the Yiddish source clearly has a non-modern YIVO-klal-spelling....