mpich icon indicating copy to clipboard operation
mpich copied to clipboard

ROMIO: incorrect flattened file offset-length pairs when using MPICH 4.0.3 and prior

Open wkliao opened this issue 1 year ago • 0 comments

When using MPICH 4.0.3 and prior, I encounter an incorrect I/O result. It happens when using MPI_Type_indexed() or MPI_Type_create_hindexed() to concatenate multiple datatype created from MPI_Type_create_subarray(). The concatenated datatype is then used to define a fileview, when opening a file. Adding printf statement inside of ROMIO reveals incorrect flattened file offset-length pairs.

This issue does not happen when using MPICH 4.1 and later. As pointed out in issue #7163, it is because MPIX_Type_iov functions are used to flatten the fileview datatype since version 4.1 and MPIX_Type_iov functions appear to be able to produce correct offset-length pairs.

I encountered this problem when running jobs on Perlmutter at NERSC, whose MPI makes use of a ROMIO from MPICH earlier than 4.1. I believe there are other MPI vendors that also use an older version of ROMIO.

Smaller reproducers are available in

The difference between the two programs are the call to MPI_Type_indexed() and MPI_Type_create_hindexed(). If they are replaced by MPI_Type_create_struct(), as in struct_fsize.c, then the generated offset-length pairs are correct and hence the test program passed.

wkliao avatar Oct 10 '24 19:10 wkliao