SOS icon indicating copy to clipboard operation
SOS copied to clipboard

PMI1 support not detected

Open davidozog opened this issue 7 years ago • 3 comments

On NERSC's Cori system with Slurm PMI-1 support enabled via:

--with-pmi --with-pmi-libdir=/usr/lib64/slurmpmi --enable-pmi1

I run into the following error when building the unit tests:

make[3]: Entering directory '/global/project/projectdirs/m88/ozog/Repos/SOS/build-cori-v1.4.2-ofi_v1.6.1-slurm_pmi/test/unit'
/bin/sh ../../libtool  --tag=CC   --mode=link gcc  -std=gnu11 -g -O2 -Wall -g -fvisibility=hidden  -L/global/homes/o/ozog/usr/local/libfabric/cori-v1.6.1/lib -L/opt/cray/xpmem/default/lib64 -L/usr/lib64/slurmpmi  -o hello hello.o ../../src/libsma.la  -lfabric  -lxpmem -lpmi
libtool: link: gcc -std=gnu11 -g -O2 -Wall -g -fvisibility=hidden -o .libs/hello hello.o  -L/global/homes/o/ozog/usr/local/libfabric/cori-v1.6.1/lib -L/opt/cray/xpmem/default/lib64 -L/usr/lib64/slurmpmi ../../src/.libs/libsma.so -L/opt/cray/xpmem/2.2.15-6.0.7.1_5.8__g7549d06.ari/lib64 -L/opt/cray/udreg/2.3.2-6.0.7.0_33.18__g5196236.ari/lib64 -L/opt/cray/alps/6.6.43-6.0.7.0_26.4__ga796da3.ari/lib64 -L/opt/cray/ugni/6.0.14.0-6.0.7.0_23.1__gea11d3d.ari/lib64 /global/homes/o/ozog/usr/local/libfabric/cori-v1.6.1/lib/libfabric.so -ludreg -lalpsutil -lalpslli -lugni -lnl-3 -lnl-route-3 -lrt -lpthread -ldl -lxpmem -lpmi -Wl,-rpath -Wl,/global/homes/o/ozog/usr/local/SOS/cori/v1.4.2-ofi_v1.6.1-slurm_pmi/lib -Wl,-rpath -Wl,/global/homes/o/ozog/usr/local/libfabric/cori-v1.6.1/lib
../../src/.libs/libsma.so: undefined reference to `shmem_runtime_exchange'
../../src/.libs/libsma.so: undefined reference to `shmem_runtime_get_size'
../../src/.libs/libsma.so: undefined reference to `shmem_runtime_abort'
../../src/.libs/libsma.so: undefined reference to `shmem_runtime_init'
../../src/.libs/libsma.so: undefined reference to `shmem_runtime_get'
../../src/.libs/libsma.so: undefined reference to `shmem_runtime_fini'
../../src/.libs/libsma.so: undefined reference to `shmem_runtime_put'
../../src/.libs/libsma.so: undefined reference to `shmem_runtime_barrier'
../../src/.libs/libsma.so: undefined reference to `shmem_runtime_get_rank'
/usr/bin/ld: link errors found, deleting executable `.libs/hello'
collect2: error: ld returned 1 exit status
Makefile:2111: recipe for target 'hello' failed
make[3]: *** [hello] Error 1
make[3]: Leaving directory '/global/project/projectdirs/m88/ozog/Repos/SOS/build-cori-v1.4.2-ofi_v1.6.1-slurm_pmi/test/unit'
Makefile:3746: recipe for target 'check-am' failed
make[2]: *** [check-am] Error 2
make[2]: Leaving directory '/global/project/projectdirs/m88/ozog/Repos/SOS/build-cori-v1.4.2-ofi_v1.6.1-slurm_pmi/test/unit'
Makefile:438: recipe for target 'check-recursive' failed
make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory '/global/project/projectdirs/m88/ozog/Repos/SOS/build-cori-v1.4.2-ofi_v1.6.1-slurm_pmi/test'
Makefile:529: recipe for target 'check-recursive' failed
make: *** [check-recursive] Error 1

davidozog avatar Oct 08 '18 18:10 davidozog

Was this fixed by #782?

jdinan avatar Oct 09 '18 14:10 jdinan

Yes. Just running all unit tests now to be sure.

davidozog avatar Oct 09 '18 14:10 davidozog

There's something a little funny going on with the RPATH stuff on the unit tests in this context. I have to do a little extra to link in the correct libpmi (either prepend LD_LIBRARY_PATH explicitly or set CC to the installed oshcc during "make check"). But, all tests pass with the correct PMI linked in. This problem is with "make check" only: I see no issues using oshcc outside the unit tests. So in short, yes, almost entirely fixed. :wink:

davidozog avatar Oct 09 '18 15:10 davidozog