mpiT icon indicating copy to clipboard operation
mpiT copied to clipboard

Advice on installing?

Open juesato opened this issue 8 years ago • 5 comments

Hi,

Thanks for releasing this library! I'm on Ubuntu 14.04, and I've tried using both sudo apt-get install mpich and sudo apt-get install libopenmpi-dev to install MPI. I can compile and run a basic hello world C program using MPI, but can't figure out how to build this library (in particular, I don't see liboshmem.so anywhere on my system). Do you have advice for the simplest way to set up MPI?

juesato avatar Dec 07 '16 01:12 juesato

I seem to be able to install (at least without errors) - I noticed the libraries are in usr/lib/openmpi/lib/*.so while the binaries are /usr/bin/mpicc and /usr/bin/mpicxx so I updated the MPI path to /usr/lib/openmpi and then set the binary locations by hand in the rock file.

However, the test breaks:

t-jouesa@rbgk40:~/code/mpiT$ mpirun -np 2 th test.lua
/home/t-jouesa/torch_installs/torch_rbgk40/install/bin/luajit: ...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: ...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: ...s/torch_rbgk40/install/share/lua/5.1/luarocks/loader.lua:117: error loading module 'libmpiT' from file '/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so':
	/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so: undefined symbol: MPI_Iallreduce
stack traceback:
	[C]: in function 'error'
	...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
	test.lua:3: in main chunk
	[C]: in function 'dofile'
	...rch_rbgk40/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
	[C]: at 0x00406670
/home/t-jouesa/torch_installs/torch_rbgk40/install/bin/luajit: ...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: ...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: ...s/torch_rbgk40/install/share/lua/5.1/luarocks/loader.lua:117: error loading module 'libmpiT' from file '/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so':
	/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so: undefined symbol: MPI_Iallreduce
stack traceback:
	[C]: in function 'error'
	...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
	test.lua:3: in main chunk
	[C]: in function 'dofile'
	...rch_rbgk40/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
	[C]: at 0x00406670
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------

Thank you in advance.

juesato avatar Dec 07 '16 02:12 juesato

Can you try to ldd the '/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so’?

Sixin

On Dec 7, 2016, at 3:35 AM, Jonathan Uesato [email protected] wrote:

'/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so'

sixin-zh avatar Dec 07 '16 10:12 sixin-zh

Hi, Thank you for the help! Sorry for the late reply.

t-jouesa@rbgk40:~/code/mpiT$ ldd /home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so
	linux-vdso.so.1 =>  (0x00007fffdc7fe000)
	libluaT.so.0 => /home/t-jouesa/torch_installs/torch_rbgk40/install/lib/libluaT.so.0 (0x00007f910242e000)
	libmpi.so.1 => /usr/lib/libmpi.so.1 (0x00007f9102084000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f9101e66000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9101aa1000)
	libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f910189d000)
	libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f910165d000)
	libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f9101453000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f910285a000)
	libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f9101247000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9101043000)

juesato avatar Dec 07 '16 19:12 juesato

Maybe you can look into the issue at https://github.com/sixin-zh/mpiT/issues/22 https://github.com/sixin-zh/mpiT/issues/22 It seems to be related. Have a try to do this in test.lua:

local ffi = require("ffi") ffi.load("libmpi",true)

require 'mpiT'

mpiT.Init()

Sixin

On Dec 7, 2016, at 8:47 PM, Jonathan Uesato [email protected] wrote:

Hi, Thank you for the help! Sorry for the late reply.

t-jouesa@rbgk40:~/code/mpiT$ ldd /home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so linux-vdso.so.1 => (0x00007fffdc7fe000) libluaT.so.0 => /home/t-jouesa/torch_installs/torch_rbgk40/install/lib/libluaT.so.0 (0x00007f910242e000) libmpi.so.1 => /usr/lib/libmpi.so.1 (0x00007f9102084000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f9101e66000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9101aa1000) libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f910189d000) libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f910165d000) libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f9101453000) /lib64/ld-linux-x86-64.so.2 (0x00007f910285a000) libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f9101247000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9101043000) — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sixin-zh/mpiT/issues/23#issuecomment-265552974, or mute the thread https://github.com/notifications/unsubscribe-auth/ACxUhYzKvb8hvjrqS_K_nobW1Z9hw7Siks5rFw1mgaJpZM4LGFyQ.

sixin-zh avatar Dec 07 '16 20:12 sixin-zh

Hi, I've tried adding those lines, and it fails on the same line for the same reason (undefined symbol MPI_Iallreduce).

t-jouesa@rbgk40:~/code/mpiT$ mpirun -np 2 th test.lua
/home/t-jouesa/torch_installs/torch_rbgk40/install/bin/luajit: ...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: ...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: ...s/torch_rbgk40/install/share/lua/5.1/luarocks/loader.lua:117: error loading module 'libmpiT' from file '/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so':
	/home/t-jouesa/torch_installs/torch_rbgk40/install/lib/lua/5.1/libmpiT.so: undefined symbol: MPI_Iallreduce
stack traceback:
	[C]: in function 'error'
	...stalls/torch_rbgk40/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
	test.lua:5: in main chunk

Is this symbol supposed to be defined in libmpi.so.1? Is there any way I can test whether it's been defined yet?

juesato avatar Dec 07 '16 20:12 juesato