ocrd-website
ocrd-website copied to clipboard
models: Document how to set TESSDATA_PREFIX to be compatible with ocrd resmgr
In essence:
export TESSDATA_PREFIX=$XDG_DATA_HOME/ocrd-resources/ocrd-tesserocr-recognize
to the .bashrc/.zshrc.
cf. https://github.com/OCR-D/ocrd_all/pull/240#discussion_r588179361
Is this really a documentation issue? IIRC exporting that variable to the shell is helpful to make the Tesseract standalone CLI use the OCR-D data directories. IINM this could simply be done in ocrd_all's venv automatically.
However, I do think that the current formulation in models.md for Tesseract is outdated:
If the default location (
virtualenv) is not the place you want to use for tesseract models, consider changing the default location in the OCR-D config file.
There is no virtualenv location, the default is data now. Also, there is no config file for this anymore (config.yml has been abandoned). Perhaps this should now read:
That single location is restricted to be
data(so downloading models tosystemorcwdis useless). But if the default location is not the place you want to use for Tesseract models, consider either redefining XDG_DATA_HOME or use the environment variableTESSDATA_PREFIXas an override.
Again, outdated. TESSDATA_PREFIX should not be necessary anymore.
There is no
virtualenvlocation, the default isdatanow.
Default is module now (precompiled or backed by TESSDATA_PREFIX as optional override).
Also, there is no config file for this anymore (config.yml has been abandoned). Perhaps this should now read:
There is still resources.yml though...
Superseded by OCR-D/ocrd_all#378