lab
lab copied to clipboard
add guide for tesseract
For a support case, I tried to install tesseract on a U7 account based on https://github.com/tesseract-ocr/tesseract/wiki/Compiling#install-elsewhere--without-root simply to find out if it works. Quickly sharing my results:
- Building leptonica (dependency):
$ curl -L https://github.com/DanBloomberg/leptonica/releases/download/1.79.0/leptonica-1.79.0.tar.gz | tar -xzvf -
$ cd leptonica-1.79.0
$ ./configure --prefix=$HOME/local/
$ make
$ make install
- Building tesseract:
$ curl -L https://github.com/tesseract-ocr/tesseract/archive/4.1.1.tar.gz | tar -xzvf -
$ cd tesseract-4.1.1
$ ./autogen.sh
$ export PKG_CONFIG_PATH=$HOME/local/lib/pkgconfig
$ LIBLEPT_HEADERSDIR=$HOME/local/include ./configure --prefix=$HOME/local/ --with-extra-libraries=$HOME/local/lib
$ make
$ make install
- Voilà:
$ ~/local/bin/tesseract --version
tesseract 4.1.1
leptonica-1.79.0
libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 : libwebp 1.0.3 : libopenjp2 2.3.1
Found AVX2
Found AVX
Found FMA
Found SSE
If anybody would want to create a guide, this might be a jumpstart.
@jonmz thanks for the guide. Unfortunately it errors out for me at the make step for tesseract
/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: cannot find -lbrotlidec
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:3343: libtesseract.la] Error 1
make[1]: Leaving directory '/home/my_user_name/app/tesseract-5.3.3'
make: *** [Makefile:7836: all-recursive] Error 1
I did use higher version numbers (leptonica 1.83.1 & tesseract 5.3.3), maybe that's the cause...
I assume that this is a change within Tesseract 5.0 which added support for running OCR on URL-based images: https://github.com/tesseract-ocr/tesseract/commit/286d8275c783062057d09bb8e5e6607a8917abd9 This requires CURL, which requires Brotli. You might build without CURL support (--with-curl=no) or need to provide the Brotli header files as well.
That seems to have worked - sweet! Thanks so much!