tesseract
tesseract copied to clipboard
RFC: Remove tessdata directory and replace it by a submodule
Signed-off-by: Stefan Weil [email protected]
This is for further discussion.
The tessdata
subdirectory is now filled with data from https://github.com/tesseract-ocr/tessconfigs which is included as a submodule, so for people who just clone the tesseract repository it is empty. Therefore tessdata
is no longer included in the build process as a subdirectory, and make install
currently does not install tessdata
files. There is a new Makefile target: make install-tessdata
will install that or fail if the submodule was not cloned.
The problems with AppVeyor look unrelated to my modifications.
Why? I don't see the benefit here. IMO, It's just complicating the setup for the poor user.
@amitdo: poor user do not use git ;-) IMO this it start of "rethinking" what is the best way of tesseract distribution. From tessdata only pdf.ttf is really needed (maybe it could be compiled into library???). Personally I would prefer splitting this project (repository) to more individual parts, that could be consolidated into one repository (for poor users ;-) ). There are at least 3 parts that could exists separately:
- libtesseract
- tesseract executable
- tesseract training tools
@stweil, if you still want to do it, go ahead. Otherwise, maybe it's time to close this PR.
@stweil,
https://github.com/tesseract-ocr/tesseract/pull/2459#issuecomment-830689263