ompi icon indicating copy to clipboard operation
ompi copied to clipboard

Variable forwarding in case of spawn not working for children

Open hamme74 opened this issue 3 months ago • 14 comments

Running an MPI simulation with MPI_Comm_spawn and setting an MPI simulation does not work as expected. The parent process can see the variable set by "-x", but the children cannot.

Open MPI v5.0.9 from tar.gz hwloc v2.12.2 from tar.gz OpenPMIx master branch from 2025-09-08 from git (also tested: v6.0.0 from tar.gz) prrte: master branch from 2025-09-08 from git (also tested: v4.0.0 from tar.gz) libfabric v2.2.0 from tar.gz

Tested on RHEL 8.10.

Code parent

#include <filesystem>
#include <iostream>
#include <mpi.h>

std::filesystem::path GetInstallPath(char *argv[]) {
    std::filesystem::path myInstallPath = std::filesystem::absolute(argv[0]);
    myInstallPath = myInstallPath.parent_path();
    myInstallPath = std::filesystem::canonical(myInstallPath);
    return myInstallPath;
}

int main(int argc, char *argv[]) {
    MPI_Init(&argc, &argv);

    char const* pUhu =std::getenv("uhu");
    if (pUhu) {
        std::cout << "parent  uhu: " << pUhu << std::endl;
    }

    std::filesystem::path sInstallPath = GetInstallPath(argv);
    std::string const sChildPath = (sInstallPath / "child.out").string();

    int iRank = -1;
    MPI_Comm_rank(MPI_COMM_WORLD, &iRank);

    int iSize = -1;
    MPI_Comm_size(MPI_COMM_WORLD, &iSize);

    int iName = -1;
    char sName[MPI_MAX_PROCESSOR_NAME];
    MPI_Get_processor_name(sName, &iName);

    std::cout << "I am a parent with rank: " << iRank << ", " << sName << std::endl;

    MPI_Comm parentcomm;
    MPI_Comm_get_parent(&parentcomm);

    MPI_Comm intercomm;

    MPI_Info info;
    MPI_Info_create(&info);
    MPI_Info_set(info, "map_by", "node");

    MPI_Comm_spawn(sChildPath.c_str(), MPI_ARGV_NULL, iSize, info, 0, MPI_COMM_WORLD, &intercomm, MPI_ERRCODES_IGNORE);
    MPI_Info_free(&info);

    MPI_Finalize();
    return 0;
}

Code child

#include <cstdlib>
#include <iostream>
#include <mpi.h>

int main(int argc, char *argv[]) {
    MPI_Init(&argc, &argv);

    char const* pUhu =std::getenv("uhu");
    if (pUhu) {
        std::cout << "child uhu: " << pUhu << std::endl;
    }

    int iRank = -1;
    MPI_Comm_rank(MPI_COMM_WORLD, &iRank);

    int iName = -1;
    char sName[MPI_MAX_PROCESSOR_NAME];
    MPI_Get_processor_name(sName, &iName);

    std::cout << "I am a child with rank: " << iRank << ", " << sName << std::endl;
    system("taskset -cp $$");

    MPI_Finalize();
    return 0;
}

$ export uhu=666 $ /opt/openmpi/bin/mpiexec --runtime-options fwd-environment --map-by node:oversubscribe -x uhu=777 --host node1,node2 -n 2 ./parent.out

The call above results in: parent uhu: 777 parent uhu: 777 I am a parent with rank: 0, node1 I am a parent with rank: 1, node2 child uhu: 666 child uhu: 666 I am a child with rank: 1, node2 I am a child with rank: 0, node1 pid 667310's current affinity list: 0-7 pid 668545's current affinity list: 0-7

Even if --fwd-env is not set, only the parent process sees the variable "uhu". However, the children should see the variable, too. This would make the behavior consistent with --fwd-env.

Just to mention it: In the latest version of pmix and prrte, fwd-environment works as expected.

Also --unset-env, --prepend-env and --append-env should be checked. I did not test these options, but I expect, that they will only modify the parent.

Many thanks in advance for looking into this issue.

hamme74 avatar Sep 09 '25 07:09 hamme74