Robert Sachunsky
Robert Sachunsky
This adds `RIL_SYMBOL` bboxes and text to the ALTO output via `Glyph`, which was introduced with v4, hence the namespace update. Looking at the changelog of the schema XSD, I...
AFAICS, `lstmtraining` produces two types of figures for measuring the error: 1. **bag-of-character training error** (on `list.train`): this is shown as - `char train=%.3f%%` every 100 iterations - `Finished! Error...
(sorry for the noise, shouldn't have tried to rename my branches while the PR was open – that closed them!) revival of #260 (all comments there still apply)
With the current master, I cannot `pip install` anymore: ``` Building wheel for tesserocr (setup.py) ... error ERROR: Command errored out with exit status 1: command: /data/venv/bin/python3 -u -c 'import...
@sirfz you have [mentioned before](https://github.com/sirfz/tesserocr/issues/96#issuecomment-374637623) that using `SetImageFile` can be better than `SetImage` when doing layout analysis. I can fully confirm that. There's a big difference for JPEG files between...
Not sure if this is related to #53: why does the current default `NORM_MODE` set 2 for non-Indic, non-RTL languages? Shouldn't this be 1? Also, the decision tree looks quite...
(because training fails if a .unicharset has already been created previously, but for a different START_MODEL)
By default tesstrain builds vanilla tesseract / lstmtraining, which IINM links against OpenMP. I know @stweil argued repeatedly for disabling OpenMP for prediction in the mass production / batch scenario,...
How about aiding users in [cut-off training](https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#training-just-a-few-layers), besides finetuning and from scratch? We already have `START_MODEL` and `NET_SPEC`, so IIUC we would minimally only need to add some additional variable,...
This adds proper setuptools packaging under the (free) name `pix2pixhd`, using the meta-data available. (If you want, I can make a release on PyPI, too.) Builds on the following PRs:...