mpiT
mpiT copied to clipboard
Advice on installing?
Hi,
Thanks for releasing this library! I'm on Ubuntu 14.04, and I've tried using both sudo apt-get install mpich
and sudo apt-get install libopenmpi-dev
to install MPI. I can compile and run a basic hello world C program using MPI, but can't figure out how to build this library (in particular, I don't see liboshmem.so
anywhere on my system). Do you have advice for the simplest way to set up MPI?
I seem to be able to install (at least without errors) - I noticed the libraries are in usr/lib/openmpi/lib/*.so
while the binaries are /usr/bin/mpicc
and /usr/bin/mpicxx
so I updated the MPI path to /usr/lib/openmpi
and then set the binary locations by hand in the rock file.
However, the test breaks:
t-jouesa@rbgk40:~/code/mpiT$ mpirun -np 2 th test.lua
/home/t-jouesa/torch_installs/torch_rbgk40/install/bin/luajit: ...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: ...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: ...s/torch_rbgk40/install/share/lua/5.1/luarocks/loader.lua:117: error loading module 'libmpiT' from file '/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so':
/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so: undefined symbol: MPI_Iallreduce
stack traceback:
[C]: in function 'error'
...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
test.lua:3: in main chunk
[C]: in function 'dofile'
...rch_rbgk40/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
/home/t-jouesa/torch_installs/torch_rbgk40/install/bin/luajit: ...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: ...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: ...s/torch_rbgk40/install/share/lua/5.1/luarocks/loader.lua:117: error loading module 'libmpiT' from file '/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so':
/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so: undefined symbol: MPI_Iallreduce
stack traceback:
[C]: in function 'error'
...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
test.lua:3: in main chunk
[C]: in function 'dofile'
...rch_rbgk40/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
Thank you in advance.
Can you try to ldd the '/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so’?
Sixin
On Dec 7, 2016, at 3:35 AM, Jonathan Uesato [email protected] wrote:
'/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so'
Hi, Thank you for the help! Sorry for the late reply.
t-jouesa@rbgk40:~/code/mpiT$ ldd /home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so
linux-vdso.so.1 => (0x00007fffdc7fe000)
libluaT.so.0 => /home/t-jouesa/torch_installs/torch_rbgk40/install/lib/libluaT.so.0 (0x00007f910242e000)
libmpi.so.1 => /usr/lib/libmpi.so.1 (0x00007f9102084000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f9101e66000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9101aa1000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f910189d000)
libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f910165d000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f9101453000)
/lib64/ld-linux-x86-64.so.2 (0x00007f910285a000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f9101247000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9101043000)
Maybe you can look into the issue at https://github.com/sixin-zh/mpiT/issues/22 https://github.com/sixin-zh/mpiT/issues/22 It seems to be related. Have a try to do this in test.lua:
local ffi = require("ffi") ffi.load("libmpi",true)
require 'mpiT'
mpiT.Init()
Sixin
On Dec 7, 2016, at 8:47 PM, Jonathan Uesato [email protected] wrote:
Hi, Thank you for the help! Sorry for the late reply.
t-jouesa@rbgk40:~/code/mpiT$ ldd /home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so linux-vdso.so.1 => (0x00007fffdc7fe000) libluaT.so.0 => /home/t-jouesa/torch_installs/torch_rbgk40/install/lib/libluaT.so.0 (0x00007f910242e000) libmpi.so.1 => /usr/lib/libmpi.so.1 (0x00007f9102084000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f9101e66000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9101aa1000) libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f910189d000) libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f910165d000) libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f9101453000) /lib64/ld-linux-x86-64.so.2 (0x00007f910285a000) libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f9101247000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9101043000) — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sixin-zh/mpiT/issues/23#issuecomment-265552974, or mute the thread https://github.com/notifications/unsubscribe-auth/ACxUhYzKvb8hvjrqS_K_nobW1Z9hw7Siks5rFw1mgaJpZM4LGFyQ.
Hi, I've tried adding those lines, and it fails on the same line for the same reason (undefined symbol MPI_Iallreduce).
t-jouesa@rbgk40:~/code/mpiT$ mpirun -np 2 th test.lua
/home/t-jouesa/torch_installs/torch_rbgk40/install/bin/luajit: ...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: ...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: ...s/torch_rbgk40/install/share/lua/5.1/luarocks/loader.lua:117: error loading module 'libmpiT' from file '/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so':
/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so: undefined symbol: MPI_Iallreduce
stack traceback:
[C]: in function 'error'
...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
test.lua:5: in main chunk
Is this symbol supposed to be defined in libmpi.so.1
? Is there any way I can test whether it's been defined yet?