ompi icon indicating copy to clipboard operation
ompi copied to clipboard

Internal error using shmem_reduce in example/oshmem_max_reduction.c

Open smguzik opened this issue 1 year ago • 8 comments

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v5.0.2

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

From source tarball using: Configure command line: '--build=x86_64-linux-gnu' '--prefix=/usr/local/openmpi/5.0.2_gcc-12.2.0' '--with-ucx' '--with-pmix=internal' '--with-libevent=external' '--with-hwloc=external' '--enable-mpi-fortran=all' '--with-cuda=/usr/local/cuda' '--with-cuda-libdir=/usr/lib/x86_64-linux-gnu'

Please describe the system on which you are running

  • Operating system/version: Debian 12.4
  • Computer hardware: x86_64
  • Network type: Single node

Details of the problem

oshmem_max_reduction.c works as provided in the examples directory. However, using the more recent API, replacing

shmem_long_max_to_all(dst, src, N, 0, 0, num_pes, pWrk, pSync);

with

shmem_long_max_reduce(SHMEM_TEAM_WORLD, dst, src, N);

fails with the message

[shmem_reduce.c:473:pshmem_long_max_reduce] Internal error is appeared rc = -7

smguzik avatar Mar 20 '24 18:03 smguzik

@janjust I see --with-ucx - guess you would be interested 😄

wenduwan avatar Mar 21 '24 15:03 wenduwan

Added main label assuming oshmem is the same with v5.0.x

wenduwan avatar Mar 21 '24 15:03 wenduwan

It seems that the new API is not implemented yet in UCX spml module (or anywhere else):

From ucx/spml.c:1850

/* This routine is not implemented */
int mca_spml_ucx_team_reduce(shmem_team_t team, void
        *dest, const void *source, size_t nreduce, int operation, int datatype)
{
    return OSHMEM_ERR_NOT_IMPLEMENTED;
}

@MamziB Any chance I'm missing something? or it's a known TBD?

roiedanino avatar Apr 04 '24 13:04 roiedanino

I am having the same issue. Should I use the old OpenSHMEM API or there is a way to bypass this?

popina1994 avatar Apr 04 '24 23:04 popina1994

@roiedanino yeah we will implement this in the future. @popina1994 Should I use the old OpenSHMEM API or there is a way to bypass this? yes please go ahead and use the old openshmem for now. if I find a better workaround I will update here.

MamziB avatar Apr 05 '24 18:04 MamziB

@MamziB can reassign to yourself please?

gleon99 avatar Apr 07 '24 12:04 gleon99

@MamziB ?

gleon99 avatar Apr 21 '24 11:04 gleon99

@gleon99 Sure let me assign it to myself. Thanks for reminder.

MamziB avatar Apr 22 '24 14:04 MamziB