mpiT
mpiT copied to clipboard
errors with openMPI bindings
Hi Sixin,
mpiT works pretty well with MPICH, however, I have some errors during installation and testing mpiT with openMPI-2.0.1.
During installation, following errors appears:
[ 50%] Building C object CMakeFiles/mpiT.dir/mpiT.c.o
In file included from /home/hao/tools/mpiT/mpiT.c:18:0:
/home/hao/tools/mpiT/lua-mpi.h: In function ‘_MPI_Op’:
/home/hao/tools/openmpi-2.0.1/install/include/mpi.h:313:46: warning: passing argument 2 of ‘luampi_push_MPI_Op’ from incompatible pointer type
#define OMPI_PREDEFINED_GLOBAL(type, global) ((type) ((void *) &(global)))
^
/home/hao/tools/mpiT/lua-mpi.h:19:28: note: in definition of macro ‘MPI_STRUCT_TYPE’
luampi_push_MPI_##s(L, inival, N); \
^
/home/hao/tools/openmpi-2.0.1/install/include/mpi.h:1055:27: note: in expansion of macro ‘OMPI_PREDEFINED_GLOBAL’
#define MPI_DATATYPE_NULL OMPI_PREDEFINED_GLOBAL(MPI_Datatype, ompi_mpi_datatype_null)
^
/home/hao/tools/mpiT/lua-mpi.h:51:21: note: in expansion of macro ‘MPI_DATATYPE_NULL’
MPI_STRUCT_TYPE(Op, MPI_DATATYPE_NULL)
^
/home/hao/tools/mpiT/lua-mpi.h:8:15: note: expected ‘MPI_Op’ but argument is of type ‘struct ompi_datatype_t *’
static void luampi_push_MPI_##s(lua_State *L, MPI_##s init, int N) \
^
/home/hao/tools/mpiT/lua-mpi.h:51:1: note: in expansion of macro ‘MPI_STRUCT_TYPE’
MPI_STRUCT_TYPE(Op, MPI_DATATYPE_NULL)
^
In file included from /home/hao/tools/mpiT/mpiT.c:18:0:
/home/hao/tools/mpiT/lua-mpi.h: In function ‘register_constants’:
/home/hao/tools/mpiT/lua-mpi.h:150:3: warning: ‘ompi_mpi_ub’ is deprecated (declared at /home/hao/tools/openmpi-2.0.1/install/include/mpi.h:926): MPI_UB is deprecated in MPI-2.0 [-Wdeprecated-declarations]
luampi_push_MPI_Datatype(L, MPI_UB, 1); lua_setfield(L, -2, "UB");
^
/home/hao/tools/mpiT/lua-mpi.h:151:3: warning: ‘ompi_mpi_lb’ is deprecated (declared at /home/hao/tools/openmpi-2.0.1/install/include/mpi.h:925): MPI_LB is deprecated in MPI-2.0 [-Wdeprecated-declarations]
luampi_push_MPI_Datatype(L, MPI_LB, 1); lua_setfield(L, -2, "LB");
^
Though the building process continues and reports a successful installation, there are errors during testing with mpirun -n 2 th test.lua
[max:13085] mca_base_component_repository_open: unable to open mca_shmem_sysv: /home/hao/tools/openmpi-2.0.1/install/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[max:13085] mca_base_component_repository_open: unable to open mca_shmem_mmap: /home/hao/tools/openmpi-2.0.1/install/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[max:13085] mca_base_component_repository_open: unable to open mca_shmem_posix: /home/hao/tools/openmpi-2.0.1/install/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[max:13085] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[15292,1],0]
Exit code: 1
--------------------------------------------------------------------------
Hi,
Can you be more precise on how to reproduce this error? Did you use the luarocks make mpit-openmpi-1.rockspec https://github.com/sixin-zh/mpiT#test-mpit Sixin
On Nov 28, 2016, at 9:13 PM, Hao Li [email protected] wrote:
Hi Sixin,
mpiT works pretty well with MPICH, however, I have some errors during installation and testing mpiT with openMPI-2.0.
During installation, following errors appears:
[ 50%] Building C object CMakeFiles/mpiT.dir/mpiT.c.o In file included from /home/hao/tools/mpiT/mpiT.c:18:0: /home/hao/tools/mpiT/lua-mpi.h: In function ‘MPI_Op’: /home/hao/tools/openmpi-2.0.1/install/include/mpi.h:313:46: warning: passing argument 2 of ‘luampi_push_MPI_Op’ from incompatible pointer type #define OMPI_PREDEFINED_GLOBAL(type, global) ((type) ((void *) &(global))) ^ /home/hao/tools/mpiT/lua-mpi.h:19:28: note: in definition of macro ‘MPI_STRUCT_TYPE’ luampi_push_MPI##s(L, inival, N);
^ /home/hao/tools/openmpi-2.0.1/install/include/mpi.h:1055:27: note: in expansion of macro ‘OMPI_PREDEFINED_GLOBAL’ #define MPI_DATATYPE_NULL OMPI_PREDEFINED_GLOBAL(MPI_Datatype, ompi_mpi_datatype_null) ^ /home/hao/tools/mpiT/lua-mpi.h:51:21: note: in expansion of macro ‘MPI_DATATYPE_NULL’ MPI_STRUCT_TYPE(Op, MPI_DATATYPE_NULL) ^ /home/hao/tools/mpiT/lua-mpi.h:8:15: note: expected ‘MPI_Op’ but argument is of type ‘struct ompi_datatype_t *’ static void luampi_push_MPI_##s(lua_State *L, MPI_##s init, int N)
^ /home/hao/tools/mpiT/lua-mpi.h:51:1: note: in expansion of macro ‘MPI_STRUCT_TYPE’ MPI_STRUCT_TYPE(Op, MPI_DATATYPE_NULL) ^ In file included from /home/hao/tools/mpiT/mpiT.c:18:0: /home/hao/tools/mpiT/lua-mpi.h: In function ‘register_constants’: /home/hao/tools/mpiT/lua-mpi.h:150:3: warning: ‘ompi_mpi_ub’ is deprecated (declared at /home/hao/tools/openmpi-2.0.1/install/include/mpi.h:926): MPI_UB is deprecated in MPI-2.0 [-Wdeprecated-declarations] luampi_push_MPI_Datatype(L, MPI_UB, 1); lua_setfield(L, -2, "UB"); ^ /home/hao/tools/mpiT/lua-mpi.h:151:3: warning: ‘ompi_mpi_lb’ is deprecated (declared at /home/hao/tools/openmpi-2.0.1/install/include/mpi.h:925): MPI_LB is deprecated in MPI-2.0 [-Wdeprecated-declarations] luampi_push_MPI_Datatype(L, MPI_LB, 1); lua_setfield(L, -2, "LB"); ^ Though the building process continues and report successful installation, there are errors when the testing is executed mpirun -n 2 th test.lua[max:13085] mca_base_component_repository_open: unable to open mca_shmem_sysv: /home/hao/tools/openmpi-2.0.1/install/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored) [max:13085] mca_base_component_repository_open: unable to open mca_shmem_mmap: /home/hao/tools/openmpi-2.0.1/install/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored) [max:13085] mca_base_component_repository_open: unable to open mca_shmem_posix: /home/hao/tools/openmpi-2.0.1/install/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer):
opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS
It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer):
opal_init failed --> Returned value Error (-1) instead of ORTE_SUCCESS
It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer):
ompi_mpi_init: ompi_rte_init failed --> Returned "Error" (-1) instead of "Success" (0)
*** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [max:13085] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[15292,1],0] Exit code: 1
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sixin-zh/mpiT/issues/22, or mute the thread https://github.com/notifications/unsubscribe-auth/ACxUhXaeEi9M7vSnky0QDy_NV2Qbw-weks5rCzXngaJpZM4K-Nj8.
Yes, the installation errors occurs when I use luarocks make mpit-openmpi-1.rockspec
.
I build openMPI-2.0.1 following the instructions of https://www.open-mpi.org/faq/?category=building#easy-build and I can compile and run sample programs in C with openMPI.
It’s indeed a strange issue. I looked online, and it looks like you can avoid the problem by ./configure -disable-dlopen I have tried this and it indeed works on Ubuntu. But I am not sure why..
Sixin
On Nov 29, 2016, at 9:58 PM, Hao Li [email protected] wrote:
Yes, the installation errors occurs when I use luarocks make mpit-openmpi-1.rockspec.
I build openMPI following the instructions of https://www.open-mpi.org/faq/?category=building#easy-build and I can compile and run sample programs in C with openMPI.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
I've had similar issues recently, disable-dlopen does not work with CUDA. Try making your ffi.load(arg, true), this allows global scope for dlopen which solved things for me.
Are you suggesting to support ffi in mpiT? There’s no ffi used in it so far. To support CUDA is very important though. Are there a lot of API needs to be ported for using ffi.load(arg,true)?
Sixin
On Dec 1, 2016, at 11:42 PM, Nicolas Vasilache [email protected] wrote:
ffi.load
I have tried to add the ffi.load to the test.lua, and it works with dlopen now. Thanks for the help.
local ffi = require("ffi") ffi.load("libmpi",true)
require 'mpiT'
mpiT.Init()
Sixin
On Dec 2, 2016, at 9:59 AM, Sixin Zhang [email protected] wrote:
Are you suggesting to support ffi in mpiT? There’s no ffi used in it so far. To support CUDA is very important though. Are there a lot of API needs to be ported for using ffi.load(arg,true)?
Sixin
On Dec 1, 2016, at 11:42 PM, Nicolas Vasilache <[email protected] mailto:[email protected]> wrote:
ffi.load