oshmpi icon indicating copy to clipboard operation
oshmpi copied to clipboard

amo/rma=direct is broken

Open jeffhammond opened this issue 4 years ago • 2 comments

I speculate that this is because I set amo/rma=direct but I don't think the library should crash if I do this.

Error

~/SHMEM/bale/src/bale_classic$ $HOME/SHMEM/mpich-ch4-ucx/bin/mpirun -n 1 ./build_unknown/bin/transpose_matrix

***************************************************************
Bale Version 3.00 (OpenShmem version 1.4): 2111-07-27.05:08
Running command on 1 PEs: ./build_unknown/bin/transpose_matrix
***************************************************************

Input Graph/Matrix parameters:
----------------------------------------------------
Graph model: FLAT        (-F).
Undirected, Unweighted, No Loops
Number of rows           (-N): 500000
Avg # nnz per row        (-z): 10.00
Edge probability         (-e): 0.000040

Standard options:
----------------------------------------------------
buf_cnt (buffer size)    (-b): 1024
seed                     (-s): 122222
cores_per_node           (-c): 0
Models Mask              (-M): 15

Input matrix:
----------------------------------------------------
	500000 rows
	500000 columns
	4998270 nonzeros

           AGP:   23.690
       Exstack:    0.468
Abort(202008595) on node 0 (rank 0 in comm 0): Fatal error in internal_Test: Request pending due to failure, error stack:
internal_Test(92): MPI_Test(request=0x7f043e0efbf0, flag=0x7fff42e74f0c, status=0x7fff42e74f10) failed
internal_Test(47): Invalid MPI_Request

Application info

https://github.com/jdevinney/bale

./bootstrap.sh 
python3 ./make_bale -f -s -c CC=$HOME/SHMEM/oshmpi-v2-install/bin/oshcc
$HOME/SHMEM/mpich-ch4-ucx/bin/mpirun -n 1 ./build_unknown/bin/transpose_matrix

OSHMPI info

./configure  --enable-amo=direct  --enable-rma=direct CC=$HOME/SHMEM/mpich-ch4-ucx/bin/mpicc CXX=$HOME/SHMEM/mpich-ch4-ucx/bin/mpicxx --prefix=$HOME/SHMEM/oshmpi-v2-install
commit ba66186a4b968c3d4cdb63027d9e45e23456ab1a (HEAD -> mpi-4-configure-test, origin/mpi-4-configure-test)
Author: Jeff Hammond <>
Date:   Tue Jul 27 04:59:34 2021 -0700

    support MPI_VERSION=4
    
    configure test was MPI_VERSION != 3 when the requirement
    is MPI_VERSION >= 3.
    
    resolves issue #129
    
    Signed-off-by: Jeff Hammond <>

MPI info

~/SHMEM/bale/src/bale_classic$ $HOME/SHMEM/mpich-ch4-ucx/bin/mpichversion 
MPICH Version:    	4.0a2
MPICH Release date:	unreleased development copy
MPICH Device:    	ch4:ucx
MPICH configure: 	--prefix=$HOME/SHMEM/mpich-ch4-ucx CC=gcc --without-fortran --with-device=ch4:ucx
MPICH CC: 	gcc    -O2
MPICH CXX: 	g++   -O2
MPICH F77: 	gfortran   -O2
MPICH FC: 	gfortran   -O2
MPICH Custom Information: 

jeffhammond avatar Jul 27 '21 12:07 jeffhammond

Yeah, it seems that amo/rma=direct is broken.

jeffhammond avatar Jul 27 '21 12:07 jeffhammond