mpich icon indicating copy to clipboard operation
mpich copied to clipboard

comment: assumptions about the size of shared memory domains

Open jeffhammond opened this issue 3 years ago • 1 comments

It is not uncommon to have shared-memory domains of 128 or more processes. Many HPC systems have dual-socket AMD 64-core nodes, and ARM-based HPC nodes can have more (e.g. Ampere Altra Max CPUs have 128 cores per socket).

If ~200 is still a small number for your purposes, then it's fine, but I'd be careful about such assumptions going forwards.

        /* No allreduce here because this is a shared memory domain
         * and should be a relatively small number of processes
         * and a non performance sensitive API.
         */
        for (i = 0; i < shm_comm_ptr->local_size; i++) {
            shm_offsets[i] = (MPI_Aint) total_shm_size;
            if (MPIDIG_WIN(win, info_args).alloc_shared_noncontig)
                total_shm_size += MPIDU_shm_get_mapsize(shared_table[i].size, &page_sz);
            else
                total_shm_size += shared_table[i].size;
        }

jeffhammond avatar Aug 15 '22 10:08 jeffhammond

The comment doesn't make sense and probably should be deleted. The code it commented on is just adding up local sizes. It is likely the "allreduce" comment was meant on replacing the allgather of shared table. But it appears the shared table need to be allgathered anyway. The comment serves as a red herring.

hzhou avatar Aug 15 '22 16:08 hzhou