Amit Dovev
Amit Dovev
>Maybe `paragraph_line_separator`? Confusing name, IMO. https://github.com/tesseract-ocr/tesseract/blob/5d5ae6a96392d77b9a430a4981e97a040487ac25/include/tesseract/ltrresultiterator.h#L89-L91
Tesseract's layout analysis module can't handle touching blocks or text lines.
This issue is unlikely to be fixed in the foreseeable future.
Did you try Arabic.traineddata?
`tessedit_char_whitelist=\u8f09` AFAIK. this usage is not supported. Did you tried: `tessedit_char_whitelist=載` ?
Anyway, the aliowlist / denylist feature is known to not work well with the LSTM engine.
I reviewed this patch. I looks right, but too verbose :( I approve it.
Another option is to add ``` struct boxGeometry { int32_t x; int32_t y; int32_t w; int32_t h; }; ``` and `struct boxGeometry boxGetGeometryAsStruct()` to Leptonica's public interface. In C++17 code...
@stweil, I don't understand your comment. We already require C++17 since 5.0.0.
The plan is to disable it by default in 5.1.0.