mpich
mpich copied to clipboard
ROMIO: incorrect flattened file offset-length pairs when using MPICH 4.0.3 and prior
When using MPICH 4.0.3 and prior, I encounter an incorrect I/O result.
It happens when using MPI_Type_indexed() or MPI_Type_create_hindexed()
to concatenate multiple datatype created from MPI_Type_create_subarray().
The concatenated datatype is then used to define a fileview, when opening
a file. Adding printf statement inside of ROMIO reveals incorrect flattened
file offset-length pairs.
This issue does not happen when using MPICH 4.1 and later. As pointed out
in issue #7163, it is because MPIX_Type_iov functions are used to flatten
the fileview datatype since version 4.1 and MPIX_Type_iov functions
appear to be able to produce correct offset-length pairs.
I encountered this problem when running jobs on Perlmutter at NERSC, whose MPI makes use of a ROMIO from MPICH earlier than 4.1. I believe there are other MPI vendors that also use an older version of ROMIO.
Smaller reproducers are available in
The difference between the two programs are the call to
MPI_Type_indexed() and MPI_Type_create_hindexed().
If they are replaced by MPI_Type_create_struct(), as
in struct_fsize.c, then the generated offset-length pairs
are correct and hence the test program passed.