rcps-buildscripts icon indicating copy to clipboard operation
rcps-buildscripts copied to clipboard

Install Request: poppler as a module

Open themkots opened this issue 2 years ago • 24 comments

Application: poppler compiled with gcc 9.2.0 Link: https://poppler.freedesktop.org/

Cluster: Myriad (initially for econ-myriad users)

Description:

License:

Special versions or variants:

Ticket number: IN:05333938

themkots avatar Jun 10 '22 14:06 themkots

As things look now, from econ-myriad rstudio server, a command install.packages("tesseract") needs poppler installed and despite having the devel rpms for poppler installed / available, there are linker errors when R tries to build poppler. Making a module for poppler as Brian suggested and using gcc 9.2.0 to do so, seems the right way to allow econ-myriad users to build tesseract locally by using the above-mentioned command.

themkots avatar Jun 10 '22 14:06 themkots

I am still unable to use this from the RStudio Server. I am unable to use the package pdftools because installation fails. I believe that there is still a problem with poppler. Is there an update on this?

larsnesheim avatar Jul 27 '22 10:07 larsnesheim

Is there any update on this? It seems nothing has happened since June 10??

larsnesheim avatar Oct 27 '22 07:10 larsnesheim

I've taken over looking into this from my colleague who has left the team.

balston avatar Oct 31 '22 14:10 balston

Latest version of Poppler is 22.10.0 so will start with this version. We will need both:

  • https://poppler.freedesktop.org/poppler-22.10.0.tar.xz
  • https://poppler.freedesktop.org/poppler-data-0.4.11.tar.gz

balston avatar Oct 31 '22 14:10 balston

I have a build script ready for testing.

balston avatar Oct 31 '22 15:10 balston

It didn't work first time because it was picking up an older CMAKE version. Fixed and is building now using:

module -f unload compilers mpi gcc-libs
module load beta-modules
./Poppler-22.10.0_gnu-9.2.0_install 2>&1 | tee ~/Software/Poppler/Poppler-22.10.0_gnu-9.2.0_install.log-31102022-1

balston avatar Oct 31 '22 15:10 balston

The build has failed with:

[ 25%] Building CXX object CMakeFiles/poppler.dir/poppler/PSTokenizer.cc.o
[ 25%] Building CXX object CMakeFiles/poppler.dir/poppler/SignatureInfo.cc.o
In file included from /dev/shm/tmp.DUVrfdxtul/poppler-22.10.0/poppler/SignatureInfo.cc:28:
/usr/include/nss3/hasht.h:48:29: error: ‘PRBool’ has not been declared
   48 |     void (*destroy)(void *, PRBool);
      |                             ^~~~~~
make[2]: *** [CMakeFiles/poppler.dir/poppler/SignatureInfo.cc.o] Error 1
make[1]: *** [CMakeFiles/poppler.dir/all] Error 2
make: *** [all] Error 2

Investigating ...

balston avatar Oct 31 '22 17:10 balston

Copies of the CMAKE build logs are in:

~ccspapp//Software/Poppler/CMakeError.log
~ccspapp//Software/Poppler/CMakeOutput.log

balston avatar Oct 31 '22 17:10 balston

This may be relevant:

https://bugs.freedesktop.org/show_bug.cgi?id=106388

balston avatar Oct 31 '22 18:10 balston

The patch described in the above link is needed on RedHat 7.x systems like ours. Build has now got further but now fails with the following error:

[ 39%] Building CXX object CMakeFiles/poppler.dir/poppler/CurlCachedFile.cc.o
In file included from /dev/shm/tmp.sajL4mvfDW/poppler-22.10.0/poppler/CurlCachedFile.h:18,
                 from /dev/shm/tmp.sajL4mvfDW/poppler-22.10.0/poppler/CurlCachedFile.cc:15:
/dev/shm/tmp.sajL4mvfDW/poppler-22.10.0/poppler/CurlCachedFile.cc: In member function ‘virtual size_t CurlCachedFileLoader::init(CachedFile*)’:
/dev/shm/tmp.sajL4mvfDW/poppler-22.10.0/poppler/CurlCachedFile.cc:53:33: error: ‘CURLINFO_CONTENT_LENGTH_DOWNLOAD_T’ was not declared in this scope; did you mean ‘CURLINFO_CONTENT_LENGTH_DOWNLOAD’?
   53 |         curl_easy_getinfo(curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD_T, &contentLength);
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/poppler.dir/poppler/CurlCachedFile.cc.o] Error 1
make[1]: *** [CMakeFiles/poppler.dir/all] Error 2
make: *** [all] Error 2

possibly need to load the Curl module during the build for a newer version?

balston avatar Nov 01 '22 16:11 balston

We could switch libcurl support off using:

-DENABLE_LIBCURL=OFF

in the build script but should be able to get CMAKE to pick up the correct includes and library for curl/7.47.1/gnu-4.9.2.

balston avatar Nov 01 '22 16:11 balston

I switched on verbose output in the CMAKE config and the failing compile is using the correct include paths for the CURL module. The problem is that we have Curl 7.47.1 and Poppler requires at least 7.55.0.

Need to build a new Curl before continuing with the Poppler build. Lates Curl is 7.86.0 so will start with that version.

balston avatar Nov 02 '22 12:11 balston

Curl 7.86.0 now installed. Note while Myriad is down development is being done on Kathleen so will need to be re-done on Myriad.

balston avatar Nov 02 '22 15:11 balston

Poppler build script updated to use new Curl. Building again...

balston avatar Nov 02 '22 15:11 balston

The build has got a lot further this time before failing - up to 63%. It now fails with:

[ 63%] Building CXX object utils/CMakeFiles/pdfsig.dir/pdfsig.cc.o
cd /dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/build/utils && /shared/ucl/apps/gcc/9.2.0/bin/g++  -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0 -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/fofi -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/goo -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/poppler -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/build -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/build/poppler -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/utils -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/build/utils -isystem /usr/include/cairo -isystem /usr/include/nss3 -isystem /usr/include/nspr4 -Wall -Wextra -Wpedantic -Wno-unused-parameter -Wcast-align -Wformat-security -Wframe-larger-than=65536 -Wlogical-op -Wmissing-format-attribute -Wnon-virtual-dtor -Woverloaded-virtual -Wmissing-declarations -Wundef -Wzero-as-null-pointer-constant -Wshadow -Wsuggest-override -fno-exceptions -fno-check-new -fno-common -fno-operator-names -D_DEFAULT_SOURCE -O2 -g  -fvisibility=hidden -fvisibility-inlines-hidden -std=c++17 -MD -MT utils/CMakeFiles/pdfsig.dir/pdfsig.cc.o -MF CMakeFiles/pdfsig.dir/pdfsig.cc.o.d -o CMakeFiles/pdfsig.dir/pdfsig.cc.o -c /dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/utils/pdfsig.cc
In file included from /dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/utils/pdfsig.cc:31:
/usr/include/nss3/hasht.h:48:29: error: ‘PRBool’ has not been declared
   48 |     void (*destroy)(void *, PRBool);
      |                             ^~~~~~
make[2]: *** [utils/CMakeFiles/pdfsig.dir/pdfsig.cc.o] Error 1
make[2]: Leaving directory `/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/build'
make[1]: *** [utils/CMakeFiles/pdfsig.dir/all] Error 2
make[1]: Leaving directory `/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/build'
make: *** [all] Error 2

This is the same as the first error so I must have missed one of the source files that needs to be patched for RedHat.

balston avatar Nov 02 '22 15:11 balston

I had tried to patch utils/pdfsig.cc but had made an error. Now fixed. Building again ...

balston avatar Nov 02 '22 17:11 balston

Thank you for all this work. I am sorry that it is so complicated.

On Wed, Nov 2, 2022 at 5:37 PM balston @.***> wrote:

I had tried to patch utils/pdfsig.cc but had made an error. Now fixed. Building again ...

— Reply to this email directly, view it on GitHub https://github.com/UCL-RITS/rcps-buildscripts/issues/488#issuecomment-1300997825, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABL2Y64H2DX2SCPDFJ4TD2LWGKRGVANCNFSM5YN4BRPA . You are receiving this because you commented.Message ID: @.***>

-- Professor Lars Nesheim Co-Director Centre for Microdata Methods and Practice (CEMMAP) UCL and IFS

email: @.*** phone: +44.(0)20.7679.5826 web: http://www.cemmap.ac.uk

larsnesheim avatar Nov 02 '22 17:11 larsnesheim

Further progress though the build up to 82%n and then:

[ 82%] Building C object glib/tests/CMakeFiles/pdfdrawbb.dir/pdfdrawbb.c.o
cd /dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/build/glib/tests && /shared/ucl/apps/gcc/9.2.0/bin/gcc -DG_LOG_DOMAIN=\"Poppler\" -DTESTDATADIR=\"/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/../test\" -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0 -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/fofi -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/goo -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/poppler -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/build -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/build/poppler -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/glib -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/build/glib -isystem /usr/include/glib-2.0 -isystem /usr/lib64/glib-2.0/include -isystem /usr/include/cairo -isystem /usr/include/freetype2 -Wall -std=c99 -D_DEFAULT_SOURCE -O2 -g  -fvisibility=hidden   -pthread  -DG_DISABLE_DEPRECATED  -DG_DISABLE_SINGLE_INCLUDES -pthread -std=c11 -MD -MT glib/tests/CMakeFiles/pdfdrawbb.dir/pdfdrawbb.c.o -MF CMakeFiles/pdfdrawbb.dir/pdfdrawbb.c.o.d -o CMakeFiles/pdfdrawbb.dir/pdfdrawbb.c.o -c /dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/glib/tests/pdfdrawbb.c
/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/glib/tests/pdfdrawbb.c: In function ‘main’:
/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/glib/tests/pdfdrawbb.c:65:19: warning: implicit declaration of function ‘getopt’ [-Wimplicit-function-declaration]
   65 |     while ((opt = getopt(argc, argv, "h")) != -1) {
      |                   ^~~~~~
/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/glib/tests/pdfdrawbb.c:73:30: error: ‘optind’ undeclared (first use in this function)
   73 |     if (!usage && argc - 1 < optind) {
      |                              ^~~~~~
/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/glib/tests/pdfdrawbb.c:73:30: note: each undeclared identifier is reported only once for each function it appears in
make[2]: *** [glib/tests/CMakeFiles/pdfdrawbb.dir/pdfdrawbb.c.o] Error 1
make[2]: Leaving directory `/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/build'
make[1]: *** [glib/tests/CMakeFiles/pdfdrawbb.dir/all] Error 2
make[1]: Leaving directory `/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/build'
make: *** [all] Error 2

I will investigate this one tomorrow.

balston avatar Nov 02 '22 18:11 balston

Patched glib/tests/pdfdrawbb.c to add getarg.h to the list of includes. This time the build has completed without errors.

Next step is to produce a module file.

balston avatar Nov 03 '22 15:11 balston

Built on Myriad as well.

balston avatar Nov 03 '22 16:11 balston

Module file done and pulled to Myriad. Need to use the following module commands to access it:

module -f unload compilers mpi gcc-libs
module load beta-modules
module load gcc-libs/9.2.0
module load boost/1.75.0/gnu-4.9.2
module load curl/7.86.0/gnu-4.9.2
module load poppler/22.10.0/gnu-9.2.0

balston avatar Nov 03 '22 16:11 balston

User informed.

balston avatar Nov 03 '22 17:11 balston

Run some simple tests on Myriad. For example:

pdfinfo ./CUDA/samples/NVIDIA_CUDA-11.3_Samples/NVIDIA_CUDA-11.3_Samples/3_Imaging/dct8x8/doc/dct8x8.pdf
Title:           App Note Template
Author:          Anton
Creator:         Microsoft® Word 2010
Producer:        Microsoft® Word 2010
CreationDate:    Tue Sep  3 23:54:34 2013 BST
ModDate:         Tue Sep  3 23:54:34 2013 BST
Custom Metadata: no
Metadata Stream: no
Tagged:          yes
UserProperties:  no
Suspects:        no
Form:            none
JavaScript:      no
Pages:           15
Encrypted:       no
Page size:       612 x 792 pts (letter)
Page rot:        0
File size:       764608 bytes
Optimized:       no
PDF version:     1.5
[ccaabaa@login12 Software]$ which pdfinfo
/shared/ucl/apps/Poppler/22.10.0/gnu-9.2.0/bin/pdfinfo

balston avatar Nov 04 '22 11:11 balston