tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

centos 7.9 tesseract5.2 cmake failed! boxchar.cpp:65:75: error...

Open wuyang-dl opened this issue 1 year ago • 13 comments

error INFO: /home/engine/wy/3rd/tesseract-5.2.0/src/training/pango/boxchar.cpp: In member function ‘void tesseract::BoxChar::GetDirection(int*, int*) const’: /home/engine/wy/3rd/tesseract-5.2.0/src/training/pango/boxchar.cpp:65:75: error: ‘U_RIGHT_TO_LEFT_ISOLATE’ was not declared in this scope; did you mean ‘U_RIGHT_TO_LEFT_OVERRIDE’? 65 | if (dir == U_RIGHT_TO_LEFT || dir == U_RIGHT_TO_LEFT_ARABIC || dir == U_RIGHT_TO_LEFT_ISOLATE) { | ^~~~~~~~~~~~~~~~~~~~~~~ | U_RIGHT_TO_LEFT_OVERRIDE

Environment

  • Tesseract Version: 5.2.0
  • Commit Number: tag 5.2.0
  • Platform:centos7.9

error.log shows details error.log

wuyang-dl avatar Aug 03 '22 06:08 wuyang-dl

Did you search the issue tracker before posting the issue? .e.g. https://github.com/tesseract-ocr/tesseract/issues/1374

zdenop avatar Aug 03 '22 14:08 zdenop

The CMake build should do what the Autotools build is doing: check for icu >=52.1 and refuse to build the training tools if this requirement is not met.

amitdo avatar Aug 04 '22 00:08 amitdo

@amitdo : it does: https://github.com/tesseract-ocr/tesseract/runs/7640401854?check_suite_focus=true#step:6:82

zdenop avatar Aug 04 '22 10:08 zdenop

here is relevant part of check: https://github.com/tesseract-ocr/tesseract/blob/94b9ca4343743d38fbb635ca88e50621bc2d8beb/src/training/CMakeLists.txt#L71-L77

In provided log there is no info about ICU checks...

zdenop avatar Aug 04 '22 10:08 zdenop

Thanks, Zdenko. I only looked at the MakeLists.txt located in the root directory. I forgot that there is another one in the training dir.

amitdo avatar Aug 04 '22 10:08 amitdo

if(PKG_CONFIG_FOUND) 
     pkg_check_modules(ICU REQUIRED IMPORTED_TARGET icu-uc icu-i18n)

There is no version check here.

amitdo avatar Aug 04 '22 11:08 amitdo

@amitdo : you miss the point: according reporter log there was not check for ICU. So putting there any version does not solve reporter problem. I wander how reporter managed to skip ICU presence. Plus ICU 52.1 was released 2013-10-09, so I really wonder if somebody is using older version than that...

zdenop avatar Aug 05 '22 08:08 zdenop

Plus ICU 52.1 was released 2013-10-09, so I really wonder if somebody is using older version than that...

http://mirror.centos.org/centos/7.9.2009/os/x86_64/Packages/

libicu-50.2-4.el7_7.i686.rpm

amitdo avatar Aug 06 '22 07:08 amitdo

So the solution is to use recent and well-maintained OS/distribution.

zdenop avatar Aug 06 '22 08:08 zdenop

So the solution is to use recent and well-maintained OS/distribution.

hi, I use libicui18n.so( with yum install libicu-devel cmd), for tess check script shows error info: -- Checking for modules 'icu-uc;icu-i18n' -- No package 'icu-uc' found -- No package 'icu-i18n' found

after installing libicu-devel, pango-devel and cairo-devel, cmake -D CMAKE_INSTALL_PREFIX=/usr/local -D CMAKE_BUILD_TYPE=RELEASE -D BUILD_SHARED_LIBS=ON ..

then error occurred(Linux version 3.10.0-1160.71.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Tue Jun 28 15:37:28 UTC 2022).

I tried ubuntu(Linux version 5.4.0-122-generic (buildd@lcy02-amd64-035) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #138~18.04.1-Ubuntu SMP Fri Jun 24 14:14:03 UTC 2022), tess is ok

tks

wuyang-dl avatar Aug 10 '22 03:08 wuyang-dl

You can try to build with autotools instead of CMake.

GCC 4.8 is not supported in Tesseract 5.x.

RHEL/Centos have newer GCC versions in their repos: http://mirror.centos.org/centos/7.9.2009/sclo/x86_64/rh/Packages/d/

So you can install GCC 11.

amitdo avatar Aug 10 '22 04:08 amitdo

Do you plan to train your own model using Tesseract?

If not, ICU, Pango and Cairo are not required.

Autotools: Pango, Cairo and ICU only required by training tools

I don't know if CMake behave in the same way.

amitdo avatar Aug 10 '22 09:08 amitdo

CMake behaves in the same way as Autotools. From the above information, I am sure that the reporter must modify CMake files to avoid these checks (ICU and GCC). So this is tesseract problem, but a user problem (try to compile tesseract on too with too old software)

zdenop avatar Aug 14 '22 14:08 zdenop