shm: MPI_Win_create + MPI_Win_shared_query does not work
Version: mpich v4.2.0rc3
I want to access a memory window on rank 0 created win MPI_Win_create with MPI_Win_shared_query on rank 1.
It does not work as expected (see code appended).
Instead of getting a base pointer of rank 0 (via MPI_Win_shared_query(win, 0, &ssize, &disp_unit, &baseptr); ), I get the base pointer of my own window on rank 1 together with the appropriate size if my window.
The absence of a shared memory windows seems to be standard conform, however unexpected.
But the size should be 0, not the size of my local window, correct?
2 questions arise:
- Is the current implementation MPI conform?
- Will
MPI_Win_createbe able to create shared memory windows accessible withMPI_Win_shared_query?
Maybe I did something wrong installation wise or code wise, I am thankful for any comment!
The code that returns MPI_Win_shared_query in mpidig_win.h
/* When only single process exists on the node or shared memory allocation fails,
* should only query MPI_PROC_NULL or local process. Thus, return local window's info. */
if (win->comm_ptr->node_comm == NULL || !shared_table) {
*size = win->size;
*disp_unit = win->disp_unit;
*((void **) baseptr) = win->base;
goto fn_exit;
}
The change log for mpich-v4.2.0rc3 states:
# MPI_Win_shared_query can be used on windows created by MPI_Win_create,
MPI_Win_allocate, in addition to windows created by MPI_Win_allocate_shared.
MPI_Win_allocate will create shared memory whenever feasible, including between
spawned processes on the same node.
The MPIv4.1 standard states:
MPI_Win_shared_query( )
...
Only MPI_WIN_ALLOCATE_SHARED is guaranteed to allocate shared memory. Im-
plementations are permitted, where possible, to provide shared memory for windows cre-
ated with MPI_WIN_CREATE and MPI_WIN_ALLOCATE. However, availability of shared
memory is not guaranteed. When the remote memory segment corresponding to a par-
ticular process cannot be accessed directly, this call returns size = 0 and a baseptr as if
MPI_ALLOC_MEM was called with size = 0.
...
_Advice to users._ For windows allocated using MPI_WIN_ALLOCATE or
MPI_WIN_CREATE, the group of MPI processes for which the implementation may
provide shared memory can be determined using MPI_COMM_SPLIT_TYPE described
in Section 7.4.2. (End of advice to users.)
$ mpirun --version
HYDRA build details:
Version: 4.2.0rc3
Release Date: Tue Jan 30 10:03:39 CST 2024
CC: gcc
Configure options: '--disable-option-checking' '--prefix=/home/max/mpich_4.2.0rc3' '--cache-file=/dev/null' '--srcdir=../../../../src/pm/hydra' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS=' 'LIBS=' 'CPPFLAGS= -DNETMOD_INLINE=__netmod_inline_ofi__ -I/home/max/tests/mpich-4.2.0rc3/_build/src/mpl/include -I/home/max/tests/mpich-4.2.0rc3/src/mpl/include -I/home/max/tests/mpich-4.2.0rc3/modules/json-c -I/home/max/tests/mpich-4.2.0rc3/_build/modules/json-c -D_REENTRANT -I/home/max/tests/mpich-4.2.0rc3/_build/src/mpi/romio/include -I/home/max/tests/mpich-4.2.0rc3/src/pmi/include -I/home/max/tests/mpich-4.2.0rc3/_build/src/pmi/include -I/home/max/tests/mpich-4.2.0rc3/_build/modules/yaksa/src/frontend/include -I/home/max/tests/mpich-4.2.0rc3/modules/yaksa/src/frontend/include -I/home/max/tests/mpich-4.2.0rc3/_build/modules/libfabric/include -I/home/max/tests/mpich-4.2.0rc3/modules/libfabric/include'
Process Manager: pmi
Launchers available: ssh rsh fork slurm ll lsf sge manual persist
Topology libraries available: hwloc
Resource management kernels available: user slurm ll lsf sge pbs cobalt
Demux engines available: poll select
Example Code
#include <stdio.h> #include <stdlib.h> #include <mpi.h> #include <assert.h>
#define SIZE 10
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
size = SIZE;
} else {
size = SIZE;
}
// Create shared communicator
MPI_Comm shared_comm;
MPI_Comm_split_type(MPI_COMM_WORLD,
MPI_COMM_TYPE_SHARED,
0,
MPI_INFO_NULL,
&shared_comm);
float* data;
data = (float *)calloc(size, sizeof(float));
// Initialize array on rank 0
if ( rank == 0 ){
for (int i = 0; i < SIZE; i++) {
data[i] = i;
}
} else {
for (int i = 0; i < SIZE; i++) {
data[i] = i+SIZE;
}
}
printf("Rank: %i, Data on rank 0 before modification: \n", rank);
for (int i = 0; i < SIZE; i++) {
printf("%.2f ", data[i]);
}
printf("\n");
fflush(stdout);
// Create Windows
MPI_Win win;
int err = MPI_Win_create(&data[0],
size * sizeof(float),
sizeof(float),
MPI_INFO_NULL,
shared_comm,
&win);
assert(err == MPI_SUCCESS);
MPI_Win_fence(0, win);
float *baseptr;
if (rank != 0) {
// Use MPI_Win_shared_query
int disp_unit;
MPI_Aint ssize;
MPI_Win_shared_query(win, 0, &ssize, &disp_unit, &baseptr);
assert(disp_unit > 0);
assert(ssize > 0);
// Access shared data on non-zero ranks after querying
printf("Data on rank %d: \n", rank);
for (int i = 0; i < SIZE; i++) {
printf("%.2f ", baseptr[i]);
}
printf("\n");
fflush(stdout);
float my_val = 123.9;
baseptr[3] = my_val; // Modify data through shared base pointer
printf("Modify data through shared base pointer: 'baseptr[3] = %f'\n", my_val);
fflush(stdout);
printf("Data on rank %d after modification: \n", rank);
for (int i = 0; i < SIZE; i++) {
printf("%.2f ", data[i]);
}
printf("\n");
fflush(stdout);
}
MPI_Win_fence(0, win);
MPI_Barrier(shared_comm);
MPI_Win_fence(0, win);
// Print data on rank 0 after modification
if (rank == 0) {
printf("Data on rank 0 after modification: \n");
for (int i = 0; i < SIZE; i++) {
printf("%.2f ", data[i]);
}
printf("\n");
fflush(stdout);
}
MPI_Win_fence(0, win);
MPI_Win_free(&win);
free(data); // Free dynamically allocated memory
MPI_Finalize();
return 0;
}
I think it is a bug. It should return size 0 to tell you that the memory is not shared.
Thank you for your quick reply!
I would like to linger on the question if MPI_Win_create will be able to create shared memory windows accessible with MPI_Win_shared_query? It currently does not look like it will be supported soon(?).
Would it be possible to highlight the fact that MPI_Win_create does not make memory available to MPI_Win_shared_query, and that these function call will always return size = 0, except when query the own window?
This could be done by adding a small sentence to the change log. This explicit mentioning of the fact would be very helpful!
Thanks.
You are correct that in the current release MPI_Win_create does not make the memory accessible to other processes even when they are in the same shared domain. But it seems plausible with kernel modules such as CMA and XPMEM, we could expose the memory to each other. So stay tuned.