ompi icon indicating copy to clipboard operation
ompi copied to clipboard

orte-clean broken in v4.0.6 and v4.1.1

Open jsquyres opened this issue 4 years ago • 3 comments

As reported by Sage Imel in https://www.mail-archive.com/[email protected]/msg34555.html (and reproduced locally by @jsquyres), it looks like orte-clean is broken in v4.0.6 and v4.1.1. Running it generates output like this:

[REDACTED:1918375] OPAL ERROR: Unreachable in file ext3x_client.c at line
252
[REDACTED:1918375] [[INVALID],INVALID] ORTE_ERROR_LOG: Unreachable in file
base/ess_base_std_tool.c at line 142
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_pmix.init failed
  --> Returned value Unreachable (-12) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[REDACTED:1918375] [[INVALID],INVALID] ORTE_ERROR_LOG: Unreachable in file
ess_tool_module.c at line 129
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_init failed
  --> Returned value Unreachable (-12) instead of ORTE_SUCCESS
--------------------------------------------------------------------------

jsquyres avatar Jul 21 '21 19:07 jsquyres

This problem also occurs when i only use 4.1.4 version. The command to trigger this problem is as follows:/usr/lib64/openmpi/bin/ompi-clean

[localhost.localdomain:00323] OPAL ERROR: Unreachable in file ext3x_client.c at line 252 [localhost.localdomain:00323] [[INVALID],INVALID] ORTE_ERROR_LOG: Unreachable in file base/ess_base_std_tool.c at line 142

It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_pmix.init failed --> Returned value Unreachable (-12) instead of ORTE_SUCCESS [localhost.localdomain:00323] [[INVALID],INVALID] ORTE_ERROR_LOG: Unreachable in file ess_tool_module.c at line 129

It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_init failed --> Returned value Unreachable (-12) instead of ORTE_SUCCESS

cherry530 avatar Sep 05 '22 06:09 cherry530

IIRC, you need to execute mpirun --pernode ompi-clean. The errors you show indicate that as the ompi-clean process can't find the OMPI daemon it should connect to, which indicates it needs to be started by mpirun.

rhc54 avatar Sep 05 '22 11:09 rhc54

It will report the same error when I use "mpirun --pernode ompi-clean" this way. May I ask whether orte-clean will remove this function or repair it in the future.

cherry530 avatar Sep 08 '22 06:09 cherry530