mpiT icon indicating copy to clipboard operation
mpiT copied to clipboard

errors with openMPI bindings

Open ljk628 opened this issue 8 years ago • 6 comments

Hi Sixin,

mpiT works pretty well with MPICH, however, I have some errors during installation and testing mpiT with openMPI-2.0.1.

During installation, following errors appears:

[ 50%] Building C object CMakeFiles/mpiT.dir/mpiT.c.o
In file included from /home/hao/tools/mpiT/mpiT.c:18:0:
/home/hao/tools/mpiT/lua-mpi.h: In function ‘_MPI_Op’:
/home/hao/tools/openmpi-2.0.1/install/include/mpi.h:313:46: warning: passing argument 2 of ‘luampi_push_MPI_Op’ from incompatible pointer type
 #define OMPI_PREDEFINED_GLOBAL(type, global) ((type) ((void *) &(global)))
                                              ^
/home/hao/tools/mpiT/lua-mpi.h:19:28: note: in definition of macro ‘MPI_STRUCT_TYPE’
     luampi_push_MPI_##s(L, inival, N);                                  \
                            ^
/home/hao/tools/openmpi-2.0.1/install/include/mpi.h:1055:27: note: in expansion of macro ‘OMPI_PREDEFINED_GLOBAL’
 #define MPI_DATATYPE_NULL OMPI_PREDEFINED_GLOBAL(MPI_Datatype, ompi_mpi_datatype_null)
                           ^
/home/hao/tools/mpiT/lua-mpi.h:51:21: note: in expansion of macro ‘MPI_DATATYPE_NULL’
 MPI_STRUCT_TYPE(Op, MPI_DATATYPE_NULL)
                     ^
/home/hao/tools/mpiT/lua-mpi.h:8:15: note: expected ‘MPI_Op’ but argument is of type ‘struct ompi_datatype_t *’
   static void luampi_push_MPI_##s(lua_State *L, MPI_##s init, int N)    \
               ^
/home/hao/tools/mpiT/lua-mpi.h:51:1: note: in expansion of macro ‘MPI_STRUCT_TYPE’
 MPI_STRUCT_TYPE(Op, MPI_DATATYPE_NULL)
 ^
In file included from /home/hao/tools/mpiT/mpiT.c:18:0:
/home/hao/tools/mpiT/lua-mpi.h: In function ‘register_constants’:
/home/hao/tools/mpiT/lua-mpi.h:150:3: warning: ‘ompi_mpi_ub’ is deprecated (declared at /home/hao/tools/openmpi-2.0.1/install/include/mpi.h:926): MPI_UB is deprecated in MPI-2.0 [-Wdeprecated-declarations]
   luampi_push_MPI_Datatype(L, MPI_UB, 1); lua_setfield(L, -2, "UB");
   ^
/home/hao/tools/mpiT/lua-mpi.h:151:3: warning: ‘ompi_mpi_lb’ is deprecated (declared at /home/hao/tools/openmpi-2.0.1/install/include/mpi.h:925): MPI_LB is deprecated in MPI-2.0 [-Wdeprecated-declarations]
   luampi_push_MPI_Datatype(L, MPI_LB, 1); lua_setfield(L, -2, "LB");
   ^

Though the building process continues and reports a successful installation, there are errors during testing with mpirun -n 2 th test.lua

[max:13085] mca_base_component_repository_open: unable to open mca_shmem_sysv: /home/hao/tools/openmpi-2.0.1/install/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[max:13085] mca_base_component_repository_open: unable to open mca_shmem_mmap: /home/hao/tools/openmpi-2.0.1/install/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[max:13085] mca_base_component_repository_open: unable to open mca_shmem_posix: /home/hao/tools/openmpi-2.0.1/install/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[max:13085] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[15292,1],0]
  Exit code:    1
--------------------------------------------------------------------------

ljk628 avatar Nov 28 '16 20:11 ljk628

Hi,

Can you be more precise on how to reproduce this error? Did you use the luarocks make mpit-openmpi-1.rockspec https://github.com/sixin-zh/mpiT#test-mpit Sixin

On Nov 28, 2016, at 9:13 PM, Hao Li [email protected] wrote:

Hi Sixin,

mpiT works pretty well with MPICH, however, I have some errors during installation and testing mpiT with openMPI-2.0.

During installation, following errors appears:

[ 50%] Building C object CMakeFiles/mpiT.dir/mpiT.c.o In file included from /home/hao/tools/mpiT/mpiT.c:18:0: /home/hao/tools/mpiT/lua-mpi.h: In function ‘MPI_Op’: /home/hao/tools/openmpi-2.0.1/install/include/mpi.h:313:46: warning: passing argument 2 of ‘luampi_push_MPI_Op’ from incompatible pointer type #define OMPI_PREDEFINED_GLOBAL(type, global) ((type) ((void *) &(global))) ^ /home/hao/tools/mpiT/lua-mpi.h:19:28: note: in definition of macro ‘MPI_STRUCT_TYPE’ luampi_push_MPI##s(L, inival, N);
^ /home/hao/tools/openmpi-2.0.1/install/include/mpi.h:1055:27: note: in expansion of macro ‘OMPI_PREDEFINED_GLOBAL’ #define MPI_DATATYPE_NULL OMPI_PREDEFINED_GLOBAL(MPI_Datatype, ompi_mpi_datatype_null) ^ /home/hao/tools/mpiT/lua-mpi.h:51:21: note: in expansion of macro ‘MPI_DATATYPE_NULL’ MPI_STRUCT_TYPE(Op, MPI_DATATYPE_NULL) ^ /home/hao/tools/mpiT/lua-mpi.h:8:15: note: expected ‘MPI_Op’ but argument is of type ‘struct ompi_datatype_t *’ static void luampi_push_MPI_##s(lua_State *L, MPI_##s init, int N)
^ /home/hao/tools/mpiT/lua-mpi.h:51:1: note: in expansion of macro ‘MPI_STRUCT_TYPE’ MPI_STRUCT_TYPE(Op, MPI_DATATYPE_NULL) ^ In file included from /home/hao/tools/mpiT/mpiT.c:18:0: /home/hao/tools/mpiT/lua-mpi.h: In function ‘register_constants’: /home/hao/tools/mpiT/lua-mpi.h:150:3: warning: ‘ompi_mpi_ub’ is deprecated (declared at /home/hao/tools/openmpi-2.0.1/install/include/mpi.h:926): MPI_UB is deprecated in MPI-2.0 [-Wdeprecated-declarations] luampi_push_MPI_Datatype(L, MPI_UB, 1); lua_setfield(L, -2, "UB"); ^ /home/hao/tools/mpiT/lua-mpi.h:151:3: warning: ‘ompi_mpi_lb’ is deprecated (declared at /home/hao/tools/openmpi-2.0.1/install/include/mpi.h:925): MPI_LB is deprecated in MPI-2.0 [-Wdeprecated-declarations] luampi_push_MPI_Datatype(L, MPI_LB, 1); lua_setfield(L, -2, "LB"); ^ Though the building process continues and report successful installation, there are errors when the testing is executed mpirun -n 2 th test.lua

[max:13085] mca_base_component_repository_open: unable to open mca_shmem_sysv: /home/hao/tools/openmpi-2.0.1/install/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored) [max:13085] mca_base_component_repository_open: unable to open mca_shmem_mmap: /home/hao/tools/openmpi-2.0.1/install/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored) [max:13085] mca_base_component_repository_open: unable to open mca_shmem_posix: /home/hao/tools/openmpi-2.0.1/install/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)

It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer):

opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS


It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer):

opal_init failed --> Returned value Error (-1) instead of ORTE_SUCCESS


It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer):

ompi_mpi_init: ompi_rte_init failed --> Returned "Error" (-1) instead of "Success" (0)

*** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [max:13085] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:

Process name: [[15292,1],0] Exit code: 1

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sixin-zh/mpiT/issues/22, or mute the thread https://github.com/notifications/unsubscribe-auth/ACxUhXaeEi9M7vSnky0QDy_NV2Qbw-weks5rCzXngaJpZM4K-Nj8.

sixin-zh avatar Nov 29 '16 08:11 sixin-zh

Yes, the installation errors occurs when I use luarocks make mpit-openmpi-1.rockspec.

I build openMPI-2.0.1 following the instructions of https://www.open-mpi.org/faq/?category=building#easy-build and I can compile and run sample programs in C with openMPI.

ljk628 avatar Nov 29 '16 20:11 ljk628

It’s indeed a strange issue. I looked online, and it looks like you can avoid the problem by ./configure -disable-dlopen I have tried this and it indeed works on Ubuntu. But I am not sure why..

Sixin

On Nov 29, 2016, at 9:58 PM, Hao Li [email protected] wrote:

Yes, the installation errors occurs when I use luarocks make mpit-openmpi-1.rockspec.

I build openMPI following the instructions of https://www.open-mpi.org/faq/?category=building#easy-build and I can compile and run sample programs in C with openMPI.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

sixin-zh avatar Dec 01 '16 21:12 sixin-zh

I've had similar issues recently, disable-dlopen does not work with CUDA. Try making your ffi.load(arg, true), this allows global scope for dlopen which solved things for me.

nicolasvasilache avatar Dec 01 '16 22:12 nicolasvasilache

Are you suggesting to support ffi in mpiT? There’s no ffi used in it so far. To support CUDA is very important though. Are there a lot of API needs to be ported for using ffi.load(arg,true)?

Sixin

On Dec 1, 2016, at 11:42 PM, Nicolas Vasilache [email protected] wrote:

ffi.load

sixin-zh avatar Dec 02 '16 08:12 sixin-zh

I have tried to add the ffi.load to the test.lua, and it works with dlopen now. Thanks for the help.

local ffi = require("ffi") ffi.load("libmpi",true)

require 'mpiT'

mpiT.Init()

Sixin

On Dec 2, 2016, at 9:59 AM, Sixin Zhang [email protected] wrote:

Are you suggesting to support ffi in mpiT? There’s no ffi used in it so far. To support CUDA is very important though. Are there a lot of API needs to be ported for using ffi.load(arg,true)?

Sixin

On Dec 1, 2016, at 11:42 PM, Nicolas Vasilache <[email protected] mailto:[email protected]> wrote:

ffi.load

sixin-zh avatar Dec 02 '16 09:12 sixin-zh