ginkgo
ginkgo copied to clipboard
Problem with make install: file INSTALL cannot find lib/libginkgo_device.so.1.5.0
Hi,
I am experiencing the following issue with the local installation using make install command. Note that make install fails at 100%, as well as the fact that make itself runs without issues.
[100%] Built target matrix
Install the project...
-- Install configuration: "Release"
-- Installing: /home/l00568700/tmp/bug_report/ginkgo/build/lib/pkgconfig/ginkgo.pc
-- Up-to-date: /home/l00568700/tmp/bug_report/ginkgo/build/include
-- Up-to-date: /home/l00568700/tmp/bug_report/ginkgo/build/include/ginkgo
-- Installing: /home/l00568700/tmp/bug_report/ginkgo/build/include/ginkgo/ginkgo.hpp
-- Installing: /home/l00568700/tmp/bug_report/ginkgo/build/include/ginkgo/core
...
-- Installing: /home/l00568700/tmp/bug_report/ginkgo/build/lib/cmake/Ginkgo/GinkgoTargets-release.cmake
CMake Error at devices/cmake_install.cmake:52 (file):
file INSTALL cannot find
"/home/l00568700/tmp/bug_report/ginkgo/build/lib/libginkgo_device.so.1.5.0":
No such file or directory.
Call Stack (most recent call first):
cmake_install.cmake:91 (include)
Makefile:143: recipe for target 'install' failed
make: *** [install] Error 1
Here is the output of my configure:
-- Summary of Configuration for Ginkgo (version 1.5.0 with tag develop, shortrev 75b2557763)
-- Ginkgo configuration:
-- CMAKE_BUILD_TYPE: Release
-- BUILD_SHARED_LIBS: ON
-- CMAKE_INSTALL_PREFIX: /home/l00568700/tmp/bug_report/ginkgo/build
-- PROJECT_SOURCE_DIR: /home/l00568700/tmp/bug_report/ginkgo
-- PROJECT_BINARY_DIR: /home/l00568700/tmp/bug_report/ginkgo/build
-- CMAKE_CXX_COMPILER: GNU 10.3.0 on platform Linux aarch64
-- /media/nfs/bdw-00-datashare/aarch64/easybuild/software/GCCcore/10.3.0/bin/c++
-- User configuration:
-- Enabled modules:
-- GINKGO_BUILD_OMP: ON
-- GINKGO_BUILD_MPI: ON
-- GINKGO_BUILD_REFERENCE: ON
-- GINKGO_BUILD_CUDA: OFF
-- GINKGO_BUILD_HIP: OFF
-- GINKGO_BUILD_DPCPP: OFF
-- Enabled features:
-- GINKGO_MIXED_PRECISION: OFF
-- Tests, benchmarks and examples:
-- GINKGO_BUILD_TESTS: ON
-- GINKGO_FAST_TESTS: OFF
-- GINKGO_BUILD_EXAMPLES: ON
-- GINKGO_EXTLIB_EXAMPLE:
-- GINKGO_BUILD_BENCHMARKS: ON
-- GINKGO_BENCHMARK_ENABLE_TUNING: OFF
-- Documentation:
-- GINKGO_BUILD_DOC: OFF
-- GINKGO_VERBOSE_LEVEL: 1
--
---------------------------------------------------------------------------------------------------------
--
-- Developer Tools:
-- GINKGO_DEVEL_TOOLS: OFF
-- GINKGO_WITH_CLANG_TIDY: OFF
-- GINKGO_WITH_IWYU: OFF
-- GINKGO_CHECK_CIRCULAR_DEPS: OFF
-- GINKGO_WITH_CCACHE: ON
---------------------------------------------------------------------------------------------------------
--
-- Components:
-- GINKGO_BUILD_HWLOC: ON
Regarding the rest of my setup, note that I was able to reproduce this issue on both Intel and ARM machines (these machines do not have accelerators). I am using Ubuntu 18.04 LTS and GCC 10.3, and I have cloned develop branch (commit 75b2557763) on 28.06.2022. Please let me know if more info about my system is necessary.
This error is not critical for my usage of Ginkgo, but I report it here since I thought it might be useful for the developers and the community.
Best, Luka from Huawei Munich Research Center
Just for completeness, can you tell us which CMake version you are using, and attach your CMakeCache.txt? Is the environment entirely Easybuild based?
CMake 3.20.1. I attach the CMakeCache.txt for Intel machine (I didnt find anything suspicious there myself). The environment is based on Easybuild and it is expected that only these packages are used (I cant guarantee it, but I hope it is indeed the case). I am loading the following packages:
module load goolf/2020a CMake/3.20.1 Python/3.9.5 git
module list
Currently Loaded Modules:
1) GCCcore/10.3.0 5) numactl/2.0.14 9) hwloc/2.4.1 13) libfabric/1.12.1 17) FFTW/3.3.9 21) bzip2/1.0.8 25) libreadline/8.1 29) libffi/3.3 33) Perl/5.32.1
2) zlib/1.2.11 6) XZ/5.2.5 10) OpenSSL/1.1 14) PMIx/3.2.3 18) ScaLAPACK/2.1.0 22) cURL/7.76.0 26) Tcl/8.6.11 30) expat/2.2.9 34) Python/3.9.5
3) binutils/2.36.1 7) libxml2/2.9.10 11) libevent/2.1.12 15) OpenMPI/4.1.1 19) goolf/2020a 23) libarchive/3.5.1 27) SQLite/3.35.4 31) gettext/0.21 35) git/2.32.0-nodocs
4) GCC/10.3.0 8) libpciaccess/0.16 12) UCX/1.10.0 16) OpenBLAS/0.3.15 20) ncurses/6.2 24) CMake/3.20.1 28) GMP/6.2.1 32) DB/18.1.40
I'm having a hard time reproducing your issue in a similar environment. What happens if you run make ginkgo_device
? Does the same issue occur if you replace make
by ninja
? Normally, every library being installed should also have a dependency on the underlying file, so that looks really weird.
I agree, it's quite strange. It could be that something is wrong with my environment, and somehow Ginkgo installation is triggering it. That being said, I have installed manually a lot of software on these machines, and I have never seen such problem so far...
When I run make ginkgo_device
, everything looks OK
l00568700@icx-00:~/tmp/bug_report/ginkgo/build$ make ginkgo_device
Building CXX object devices/CMakeFiles/ginkgo_device.dir/machine_topology.cpp.o
Building CXX object devices/CMakeFiles/ginkgo_device.dir/device.cpp.o
Linking CXX shared library ../lib/libginkgo_device.so
Built target ginkgo_device
l00568700@icx-00:~/tmp/bug_report/ginkgo/build$ ls lib/libginkgo_device.so
lib/libginkgo_device.so
so it is really the make install
that fails. Unfortunately, I cant easily install ninja
as its dependencies require bigger changes to the system (kernel), which I currently cannot do...
So I looked more into the Ginkgo Makefiles and CMakefiles, and then analyzed the process. I see that when I simply run make
, compilation finished correctly and libraries are already in the lib/
:
l00568700@icx-00:~/tmp/bug_report/ginkgo/build$ make -j48
...
[100%] Built target test_matrix_matrix_omp
l00568700@icx-00:~/tmp/bug_report/ginkgo/build$ ll lib/
total 52884
lrwxrwxrwx 1 l00568700 users 25 Jun 28 13:30 libginkgo_device.so -> libginkgo_device.so.1.5.0
-rwxr-xr-x 1 l00568700 users 28656 Jun 28 13:30 libginkgo_device.so.1.5.0
lrwxrwxrwx 1 l00568700 users 18 Jun 28 13:30 libgtest.so -> libgtest.so.1.11.0
-rwxr-xr-x 1 l00568700 users 578640 Jun 28 13:30 libgtest.so.1.11.0
lrwxrwxrwx 1 l00568700 users 23 Jun 28 13:30 libgtest_main.so -> libgtest_main.so.1.11.0
-rwxr-xr-x 1 l00568700 users 7864 Jun 28 13:30 libgtest_main.so.1.11.0
lrwxrwxrwx 1 l00568700 users 28 Jun 28 13:30 libginkgo_reference.so -> libginkgo_reference.so.1.5.0
-rwxr-xr-x 1 l00568700 users 2874408 Jun 28 13:30 libginkgo_reference.so.1.5.0
lrwxrwxrwx 1 l00568700 users 23 Jun 28 13:30 libginkgo_cuda.so -> libginkgo_cuda.so.1.5.0
lrwxrwxrwx 1 l00568700 users 22 Jun 28 13:30 libginkgo_hip.so -> libginkgo_hip.so.1.5.0
-rwxr-xr-x 1 l00568700 users 1228560 Jun 28 13:30 libginkgo_cuda.so.1.5.0
-rwxr-xr-x 1 l00568700 users 1225528 Jun 28 13:30 libginkgo_hip.so.1.5.0
lrwxrwxrwx 1 l00568700 users 24 Jun 28 13:30 libginkgo_dpcpp.so -> libginkgo_dpcpp.so.1.5.0
-rwxr-xr-x 1 l00568700 users 1234256 Jun 28 13:30 libginkgo_dpcpp.so.1.5.0
lrwxrwxrwx 1 l00568700 users 22 Jun 28 13:31 libginkgo_omp.so -> libginkgo_omp.so.1.5.0
-rwxr-xr-x 1 l00568700 users 12115032 Jun 28 13:31 libginkgo_omp.so.1.5.0
lrwxrwxrwx 1 l00568700 users 18 Jun 28 13:32 libginkgo.so -> libginkgo.so.1.5.0
-rwxr-xr-x 1 l00568700 users 34847392 Jun 28 13:32 libginkgo.so.1.5.0
However, when subsequently running make install
, there is the aforementioned error.
At the end of the day, I do not need make install
, but still it should not fail...
As discussed in our interactive debugging session, I attach the error output of the CMake, generated by the command cmake -P cmake_install.cmake --trace-expand
(which itself is executed at the end of make install
).
Thanks for the details, this is most likely an issue with our RPATH settings: https://stackoverflow.com/questions/69881222/why-rpath-check-in-cmake-deletes-executable
@stanisic could you check what RPATH the libginkgo_device.so has? readelf -d file.so | grep RPATH
Actually the output of the command you suggested is empty (for libginkgo_device.so and other libs). I think that Easybuild doesnt set any RPATHs.
~~Easybuild may not, but CMake should set the RPATH to $ORIGIN, which is also what the install script assumes.~~ Can you post the full output of readelf
here? It might be that it uses RUNPATH
instead?
Good catch! Indeed RUNPATH
for the libginkgo_device.so
is empty, while for all other libs it points correctly:
l00568700@icx-00:~/tmp/bug_report/ginkgo/build$ for f in lib/*.so; do echo $f; readelf -d $f | grep RUNPATH; done
lib/libginkgo_cuda.so
0x000000000000001d (RUNPATH) Library runpath: [/home/l00568700/tmp/bug_report/ginkgo/build/lib:]
lib/libginkgo_device.so
0x000000000000001d (RUNPATH) Library runpath: [:::::::]
lib/libginkgo_dpcpp.so
0x000000000000001d (RUNPATH) Library runpath: [/home/l00568700/tmp/bug_report/ginkgo/build/lib:]
lib/libginkgo_hip.so
0x000000000000001d (RUNPATH) Library runpath: [/home/l00568700/tmp/bug_report/ginkgo/build/lib:]
lib/libginkgo_omp.so
0x000000000000001d (RUNPATH) Library runpath: [/home/l00568700/tmp/bug_report/ginkgo/build/lib:]
lib/libginkgo_reference.so
0x000000000000001d (RUNPATH) Library runpath: [/home/l00568700/tmp/bug_report/ginkgo/build/lib:]
lib/libginkgo.so
0x000000000000001d (RUNPATH) Library runpath: [/media/nfs/bdw-00-datashare/x86_64/easybuild/software/hwloc/2.4.1-GCCcore-10.3.0/lib:/media/nfs/bdw-00-datashare/x86_64/easybuild/software/libevent/2.1.12-GCCcore-10.3.0/lib64:/media/nfs/bdw-00-datashare/x86_64/easybuild/software/OpenMPI/4.1.1-GCC-10.3.0/lib:/home/l00568700/tmp/bug_report/ginkgo/build/lib:]
lib/libgtest_main.so
0x000000000000001d (RUNPATH) Library runpath: [/home/l00568700/tmp/bug_report/ginkgo/build/lib:]
lib/libgtest.so
To answer your question, here is the full output of readelf
for the libginkgo_device.so
:
l00568700@icx-00:~/tmp/bug_report/ginkgo/build$ readelf -d lib/libginkgo_device.so
Dynamic section at offset 0xcd28 contains 34 entries:
Tag Type Name/Value
0x0000000000000003 (PLTGOT) 0xdfe8
0x0000000000000002 (PLTRELSZ) 1728 (bytes)
0x0000000000000017 (JMPREL) 0x27b8
0x0000000000000014 (PLTREL) RELA
0x0000000000000007 (RELA) 0x1ee8
0x0000000000000008 (RELASZ) 2256 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffff9 (RELACOUNT) 78
0x0000000000000006 (SYMTAB) 0x1c8
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000005 (STRTAB) 0xb58
0x000000000000000a (STRSZ) 3572 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x1950
0x0000000000000004 (HASH) 0x1a48
0x0000000000000001 (NEEDED) Shared library: [libasan.so.6]
0x0000000000000001 (NEEDED) Shared library: [libhwloc.so.15]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000e (SONAME) Library soname: [libginkgo_device.so.1.5.0]
0x000000000000000c (INIT) 0x2e78
0x000000000000000d (FINI) 0xb334
0x000000000000001a (FINI_ARRAY) 0xdcc0
0x000000000000001c (FINI_ARRAYSZ) 24 (bytes)
0x0000000000000019 (INIT_ARRAY) 0xdcd8
0x000000000000001b (INIT_ARRAYSZ) 32 (bytes)
0x000000000000001d (RUNPATH) Library runpath: [:::::::]
0x000000006ffffff0 (VERSYM) 0x1d6c
0x000000006ffffffc (VERDEF) 0x1e38
0x000000006ffffffd (VERDEFNUM) 1
0x000000006ffffffe (VERNEED) 0x1e54
0x000000006fffffff (VERNEEDNUM) 3
0x0000000000000000 (NULL) 0x0
Can you attach the output of make VERBOSE=1 ginkgo
on a clean build? We are mostly interested in the linker commands.
I attach the output of the command make -j48 VERBOSE=1 ginkgo
from a clean build. I guess that the most important are the following lines:
[100%] Linking CXX shared library ../lib/libginkgo.so
cd /home/l00568700/tmp/bug_report/ginkgo/build/core && /media/nfs/bdw-00-datashare/aarch64/easybuild/software/CMake/3.20.1-GCCcore-10.3.0/bin/cmake -E cmake_link_script CMakeFiles/ginkgo.dir/link.txt --verbose=1
/media/nfs/bdw-00-datashare/aarch64/easybuild/software/GCCcore/10.3.0/bin/c++ -fPIC -O3 -DNDEBUG -Wl,-rpath -Wl,/media/nfs/bdw-00-datashare/aarch64/easybuild/software/hwloc/2.4.1-GCCcore-10.3.0/lib -Wl,-rpath -Wl,/media/nfs/bdw-00-datashare/aarch64/easybuild/software/libevent/2.1.12-GCCcore-10.3.0/lib64 -Wl,-rpath -Wl,/media/nfs/bdw-00-datashare/aarch64/easybuild/software/OpenMPI/4.1.1-GCC-10.3.0/lib -Wl,--enable-new-dtags -L/media/nfs/bdw-00-datashare/aarch64/easybuild/software/hwloc/2.4.1-GCCcore-10.3.0/lib -L/media/nfs/bdw-00-datashare/aarch64/easybuild/software/libevent/2.1.12-GCCcore-10.3.0/lib -shared -Wl,-soname,libginkgo.so.1.5.0 -o ../lib/libginkgo.so.1.5.0 CMakeFiles/ginkgo.dir/base/array.cpp.o CMakeFiles/ginkgo.dir/base/combination.cpp.o CMakeFiles/ginkgo.dir/base/composition.cpp.o CMakeFiles/ginkgo.dir/base/device_matrix_data.cpp.o CMakeFiles/ginkgo.dir/base/executor.cpp.o CMakeFiles/ginkgo.dir/base/index_set.cpp.o CMakeFiles/ginkgo.dir/base/mtx_io.cpp.o CMakeFiles/ginkgo.dir/base/perturbation.cpp.o CMakeFiles/ginkgo.dir/base/version.cpp.o CMakeFiles/ginkgo.dir/distributed/partition.cpp.o CMakeFiles/ginkgo.dir/factorization/elimination_forest.cpp.o CMakeFiles/ginkgo.dir/factorization/ic.cpp.o CMakeFiles/ginkgo.dir/factorization/ilu.cpp.o CMakeFiles/ginkgo.dir/factorization/par_ic.cpp.o CMakeFiles/ginkgo.dir/factorization/par_ict.cpp.o CMakeFiles/ginkgo.dir/factorization/par_ilu.cpp.o CMakeFiles/ginkgo.dir/factorization/par_ilut.cpp.o CMakeFiles/ginkgo.dir/log/convergence.cpp.o CMakeFiles/ginkgo.dir/log/logger.cpp.o CMakeFiles/ginkgo.dir/log/performance_hint.cpp.o CMakeFiles/ginkgo.dir/log/record.cpp.o CMakeFiles/ginkgo.dir/log/stream.cpp.o CMakeFiles/ginkgo.dir/matrix/coo.cpp.o CMakeFiles/ginkgo.dir/matrix/csr.cpp.o CMakeFiles/ginkgo.dir/matrix/dense.cpp.o CMakeFiles/ginkgo.dir/matrix/diagonal.cpp.o CMakeFiles/ginkgo.dir/matrix/ell.cpp.o CMakeFiles/ginkgo.dir/matrix/fbcsr.cpp.o CMakeFiles/ginkgo.dir/matrix/fft.cpp.o CMakeFiles/ginkgo.dir/matrix/hybrid.cpp.o CMakeFiles/ginkgo.dir/matrix/identity.cpp.o CMakeFiles/ginkgo.dir/matrix/permutation.cpp.o CMakeFiles/ginkgo.dir/matrix/sellp.cpp.o CMakeFiles/ginkgo.dir/matrix/sparsity_csr.cpp.o CMakeFiles/ginkgo.dir/matrix/row_gatherer.cpp.o CMakeFiles/ginkgo.dir/multigrid/amgx_pgm.cpp.o CMakeFiles/ginkgo.dir/multigrid/fixed_coarsening.cpp.o CMakeFiles/ginkgo.dir/preconditioner/isai.cpp.o CMakeFiles/ginkgo.dir/preconditioner/jacobi.cpp.o CMakeFiles/ginkgo.dir/reorder/rcm.cpp.o CMakeFiles/ginkgo.dir/solver/bicg.cpp.o CMakeFiles/ginkgo.dir/solver/bicgstab.cpp.o CMakeFiles/ginkgo.dir/solver/cb_gmres.cpp.o CMakeFiles/ginkgo.dir/solver/cg.cpp.o CMakeFiles/ginkgo.dir/solver/cgs.cpp.o CMakeFiles/ginkgo.dir/solver/fcg.cpp.o CMakeFiles/ginkgo.dir/solver/gmres.cpp.o CMakeFiles/ginkgo.dir/solver/idr.cpp.o CMakeFiles/ginkgo.dir/solver/ir.cpp.o CMakeFiles/ginkgo.dir/solver/lower_trs.cpp.o CMakeFiles/ginkgo.dir/solver/multigrid.cpp.o CMakeFiles/ginkgo.dir/solver/upper_trs.cpp.o CMakeFiles/ginkgo.dir/stop/combined.cpp.o CMakeFiles/ginkgo.dir/stop/criterion.cpp.o CMakeFiles/ginkgo.dir/stop/iteration.cpp.o CMakeFiles/ginkgo.dir/stop/residual_norm.cpp.o CMakeFiles/ginkgo.dir/stop/time.cpp.o CMakeFiles/ginkgo.dir/mpi/exception.cpp.o -Wl,-rpath,/home/l00568700/tmp/bug_report/ginkgo/build/lib: ../lib/libginkgo_omp.so.1.5.0 ../lib/libginkgo_cuda.so.1.5.0 ../lib/libginkgo_reference.so.1.5.0 ../lib/libginkgo_hip.so.1.5.0 ../lib/libginkgo_dpcpp.so.1.5.0 -lpthread ../lib/libginkgo_device.so.1.5.0 /media/nfs/bdw-00-datashare/aarch64/easybuild/software/hwloc/2.4.1-GCCcore-10.3.0/lib/libhwloc.so /media/nfs/bdw-00-datashare/aarch64/easybuild/software/hwloc/2.4.1-GCCcore-10.3.0/lib/libhwloc.so /media/nfs/bdw-00-datashare/aarch64/easybuild/software/OpenMPI/4.1.1-GCC-10.3.0/lib/libmpi.so
cd /home/l00568700/tmp/bug_report/ginkgo/build/core && /media/nfs/bdw-00-datashare/aarch64/easybuild/software/CMake/3.20.1-GCCcore-10.3.0/bin/cmake -E cmake_symlink_library ../lib/libginkgo.so.1.5.0 ../lib/libginkgo.so.1.5.0 ../lib/libginkgo.so
looks like something is going wrong on CMake's side: the RPATH is indeed empty
[ 0%] Linking CXX shared library ../lib/libginkgo_device.so
cd /home/l00568700/tmp/bug_report/ginkgo/build/devices && /media/nfs/bdw-00-datashare/aarch64/easybuild/software/CMake/3.20.1-GCCcore-10.3.0/bin/cmake -E cmake_link_script CMakeFiles/ginkgo_device.dir/link.txt --verbose=1
/media/nfs/bdw-00-datashare/aarch64/easybuild/software/GCCcore/10.3.0/bin/c++ -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libginkgo_device.so.1.5.0 -o ../lib/libginkgo_device.so.1.5.0 CMakeFiles/ginkgo_device.dir/machine_topology.cpp.o CMakeFiles/ginkgo_device.dir/device.cpp.o -Wl,-rpath,::::::: /media/nfs/bdw-00-datashare/aarch64/easybuild/software/hwloc/2.4.1-GCCcore-10.3.0/lib/libhwloc.so /media/nfs/bdw-00-datashare/aarch64/easybuild/software/hwloc/2.4.1-GCCcore-10.3.0/lib/libhwloc.so
cd /home/l00568700/tmp/bug_report/ginkgo/build/devices && /media/nfs/bdw-00-datashare/aarch64/easybuild/software/CMake/3.20.1-GCCcore-10.3.0/bin/cmake -E cmake_symlink_library ../lib/libginkgo_device.so.1.5.0 ../lib/libginkgo_device.so.1.5.0 ../lib/libginkgo_device.so
make[3]: Leaving directory '/home/l00568700/tmp/bug_report/ginkgo/build'
I was able to reproduce the error with Ginkgo 1.6.0 and CMake 3.24.3. I think that it is actually happening when I am trying to compile and install in the same directory (so the source and destination folder of the make install
is the same). E.g.:
wget https://github.com/ginkgo-project/ginkgo/archive/refs/tags/v1.6.0.tar.gz
tar -xvf v1.6.0.tar.gz
cd ginkgo-1.6.0/
mkdir build; cd build
cmake ../ -DCMAKE_INSTALL_PREFIX=$PWD
make -j
make install
If I replace make install
with e.g. make install DESTDIR=/tmp/
everything works fine.
Ah, that makes sense. since the installation is trying to overwrite the library files with themselves and update the RPATH in the process, this is bound to fail. I guess we ignore this scenario as not practically relevant