Robert Sachunsky
Robert Sachunsky
@stweil this needs to be merged – please review
This includes essential fixes and has been hanging here for over a year for no reason. Any objections to merging?
Looks like this was closed accidentally
It is possible to get polygon-based segmentation from Tesseract: with `BlockPolygon` from the page iterator delivered by `AnalyseLayout`. There is a bug somewhere though: sometimes, paths self-intersect, which even Tesseract...
Just use `RIL.TEXTLINE` instead of `RIL.WORD` and use `enumerate` for counting. If you want _both_ the textline and the word images, then I recommend using the page/result iterator directly (for...
Also, because of GIL I recommend using multiprocessing instead of multithreading. For the details it depends on whether you want to do batch processing (like on a bunch of files)...
Plus (just to be sure): Am I correct in assuming that under 2, combining characters get recoded as extra symbol, whereas under 1 they are merged with the base character?
Decision seems to derive from here: c90cd3f27acbacc8d30db1b44d1c017aecc7bf20 @wrznr could you please elaborate on the kind of feedback you gave (or link to it)?
> @wrznr could you please elaborate on the kind of feedback you gave (or link to it)? answer (on other channel): [here](https://github.com/tesseract-ocr/tesstrain/pull/118#discussion_r341633096) – a simple question. IMHO the response should...
@Shreeshrii it seems the original deviation regarding `--norm_mode` default came from [changes](https://github.com/tesseract-ocr/tesstrain/pull/15/files) proposed by you (introducing finetuning here). Could you please elaborate on your choice?