ompi
ompi copied to clipboard
orte-clean broken in v4.0.6 and v4.1.1
As reported by Sage Imel in https://www.mail-archive.com/[email protected]/msg34555.html (and reproduced locally by @jsquyres), it looks like orte-clean is broken in v4.0.6 and v4.1.1. Running it generates output like this:
[REDACTED:1918375] OPAL ERROR: Unreachable in file ext3x_client.c at line
252
[REDACTED:1918375] [[INVALID],INVALID] ORTE_ERROR_LOG: Unreachable in file
base/ess_base_std_tool.c at line 142
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_pmix.init failed
--> Returned value Unreachable (-12) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[REDACTED:1918375] [[INVALID],INVALID] ORTE_ERROR_LOG: Unreachable in file
ess_tool_module.c at line 129
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_init failed
--> Returned value Unreachable (-12) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
This problem also occurs when i only use 4.1.4 version. The command to trigger this problem is as follows:/usr/lib64/openmpi/bin/ompi-clean
[localhost.localdomain:00323] OPAL ERROR: Unreachable in file ext3x_client.c at line 252 [localhost.localdomain:00323] [[INVALID],INVALID] ORTE_ERROR_LOG: Unreachable in file base/ess_base_std_tool.c at line 142
It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_pmix.init failed --> Returned value Unreachable (-12) instead of ORTE_SUCCESS [localhost.localdomain:00323] [[INVALID],INVALID] ORTE_ERROR_LOG: Unreachable in file ess_tool_module.c at line 129
It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_init failed --> Returned value Unreachable (-12) instead of ORTE_SUCCESS
IIRC, you need to execute mpirun --pernode ompi-clean. The errors you show indicate that as the ompi-clean process can't find the OMPI daemon it should connect to, which indicates it needs to be started by mpirun.
It will report the same error when I use "mpirun --pernode ompi-clean" this way. May I ask whether orte-clean will remove this function or repair it in the future.