tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

RFC: Remove tessdata directory and replace it by a submodule

Open stweil opened this issue 5 years ago • 6 comments

Signed-off-by: Stefan Weil [email protected]

stweil avatar May 23 '19 20:05 stweil

This is for further discussion.

The tessdata subdirectory is now filled with data from https://github.com/tesseract-ocr/tessconfigs which is included as a submodule, so for people who just clone the tesseract repository it is empty. Therefore tessdata is no longer included in the build process as a subdirectory, and make install currently does not install tessdata files. There is a new Makefile target: make install-tessdata will install that or fail if the submodule was not cloned.

stweil avatar May 23 '19 20:05 stweil

The problems with AppVeyor look unrelated to my modifications.

stweil avatar May 23 '19 20:05 stweil

Why? I don't see the benefit here. IMO, It's just complicating the setup for the poor user.

amitdo avatar May 23 '19 22:05 amitdo

@amitdo: poor user do not use git ;-) IMO this it start of "rethinking" what is the best way of tesseract distribution. From tessdata only pdf.ttf is really needed (maybe it could be compiled into library???). Personally I would prefer splitting this project (repository) to more individual parts, that could be consolidated into one repository (for poor users ;-) ). There are at least 3 parts that could exists separately:

  • libtesseract
  • tesseract executable
  • tesseract training tools

zdenop avatar May 24 '19 08:05 zdenop

@stweil, if you still want to do it, go ahead. Otherwise, maybe it's time to close this PR.

amitdo avatar May 01 '21 20:05 amitdo

@stweil,

https://github.com/tesseract-ocr/tesseract/pull/2459#issuecomment-830689263

amitdo avatar Oct 31 '21 20:10 amitdo