Robert Sachunsky issues

Results 272 issues of


                                            Robert Sachunsky

ALTO renderer: move to v4, add Glyphs

This adds `RIL_SYMBOL` bboxes and text to the ALTO output via `Glyph`, which was introduced with v4, hence the namespace update. Looking at the changelog of the schema XSD, I...

enhancement

output true CER for checkpoints (at least the final one)

AFAICS, `lstmtraining` produces two types of figures for measuring the error: 1. **bag-of-character training error** (on `list.train`): this is shown as - `char train=%.3f%%` every 100 iterations - `Finished! Error...

feature request

fix API usage examples

(sorry for the noise, shouldn't have tried to rename my branches while the PR was open – that closed them!) revival of #260 (all comments there still apply)

does not compile against libtesseract anymore

With the current master, I cannot `pip install` anymore: ``` Building wheel for tesserocr (setup.py) ... error ERROR: Command errored out with exit status 1: command: /data/venv/bin/python3 -u -c 'import...

Quality degradation due to PIL.Image.save

@sirfz you have [mentioned before](https://github.com/sirfz/tesserocr/issues/96#issuecomment-374637623) that using `SetImageFile` can be better than `SetImage` when doing layout analysis. I can fully confirm that. There's a big difference for JPEG files between...

use norm_mode 1 as default

Not sure if this is related to #53: why does the current default `NORM_MODE` set 2 for non-Indic, non-RTL languages? Shouldn't this be 1? Also, the decision tree looks quite...

pinned

explicate .lstm-unicharset and my.unicharset prereqs for finetuning

(because training fails if a .unicharset has already been created previously, but for a different START_MODEL)

pinned

Disable OpenMP

By default tesstrain builds vanilla tesseract / lstmtraining, which IINM links against OpenMP. I know @stweil argued repeatedly for disabling OpenMP for prediction in the mass production / batch scenario,...

pinned

support the cutoff training regime

How about aiding users in [cut-off training](https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#training-just-a-few-layers), besides finetuning and from scratch? We already have `START_MODEL` and `NET_SPEC`, so IIUC we would minimally only need to add some additional variable,...

pinned

setuptools packaging

This adds proper setuptools packaging under the (free) name `pix2pixhd`, using the meta-data available. (If you want, I can make a release on PyPI, too.) Builds on the following PRs:...