Robert Sachunsky comments

Results 735 comments of


                                            Robert Sachunsky

explicate .lstm-unicharset and my.unicharset prereqs for finetuning

@stweil this needs to be merged – please review

explicate .lstm-unicharset and my.unicharset prereqs for finetuning

This includes essential fixes and has been hanging here for over a year for no reason. Any objections to merging?

Add --vertical_fontlist option to tesstrain.py

Looks like this was closed accidentally

Page level images

It is possible to get polygon-based segmentation from Tesseract: with `BlockPolygon` from the page iterator delivered by `AnalyseLayout`. There is a bug somewhere though: sometimes, paths self-intersect, which even Tesseract...

line_num with api.GetComponentImages(RIL.WORD, True)

Just use `RIL.TEXTLINE` instead of `RIL.WORD` and use `enumerate` for counting. If you want _both_ the textline and the word images, then I recommend using the page/result iterator directly (for...

Parallel requests increases time

Also, because of GIL I recommend using multiprocessing instead of multithreading. For the details it depends on whether you want to do batch processing (like on a bunch of files)...

use norm_mode 1 as default

Plus (just to be sure): Am I correct in assuming that under 2, combining characters get recoded as extra symbol, whereas under 1 they are merged with the base character?

use norm_mode 1 as default

Decision seems to derive from here: c90cd3f27acbacc8d30db1b44d1c017aecc7bf20 @wrznr could you please elaborate on the kind of feedback you gave (or link to it)?

use norm_mode 1 as default

> @wrznr could you please elaborate on the kind of feedback you gave (or link to it)? answer (on other channel): [here](https://github.com/tesseract-ocr/tesstrain/pull/118#discussion_r341633096) – a simple question. IMHO the response should...

use norm_mode 1 as default

@Shreeshrii it seems the original deviation regarding `--norm_mode` default came from [changes](https://github.com/tesseract-ocr/tesstrain/pull/15/files) proposed by you (introducing finetuning here). Could you please elaborate on your choice?