ocrd-website icon indicating copy to clipboard operation
ocrd-website copied to clipboard

models: Document how to set TESSDATA_PREFIX to be compatible with ocrd resmgr

Open kba opened this issue 4 years ago • 1 comments
trafficstars

In essence:

export TESSDATA_PREFIX=$XDG_DATA_HOME/ocrd-resources/ocrd-tesserocr-recognize

to the .bashrc/.zshrc.

cf. https://github.com/OCR-D/ocrd_all/pull/240#discussion_r588179361

kba avatar Mar 05 '21 10:03 kba

Is this really a documentation issue? IIRC exporting that variable to the shell is helpful to make the Tesseract standalone CLI use the OCR-D data directories. IINM this could simply be done in ocrd_all's venv automatically.

However, I do think that the current formulation in models.md for Tesseract is outdated:

If the default location (virtualenv) is not the place you want to use for tesseract models, consider changing the default location in the OCR-D config file.

There is no virtualenv location, the default is data now. Also, there is no config file for this anymore (config.yml has been abandoned). Perhaps this should now read:

That single location is restricted to be data (so downloading models to system or cwd is useless). But if the default location is not the place you want to use for Tesseract models, consider either redefining XDG_DATA_HOME or use the environment variable TESSDATA_PREFIX as an override.

bertsky avatar Aug 25 '21 13:08 bertsky

Again, outdated. TESSDATA_PREFIX should not be necessary anymore.

There is no virtualenv location, the default is data now.

Default is module now (precompiled or backed by TESSDATA_PREFIX as optional override).

Also, there is no config file for this anymore (config.yml has been abandoned). Perhaps this should now read:

There is still resources.yml though...

bertsky avatar Mar 16 '23 12:03 bertsky

Superseded by OCR-D/ocrd_all#378

kba avatar Apr 25 '23 12:04 kba