mpich bug: collective read past end of file

Originally by robl on 2010-08-31 16:56:19 -0500

pascal d. reported that mpi_get_count after a read past end of file reported the count of bytes requested, but should instead report the count of bytes actually read.

Oct 14 '16 16:10 mpichbot

Originally by robl on 2010-08-31 17:01:01 -0500

I looked at this for a while and got close but not quite there.

the easy approach is to have every process report how many bytes actually read: to percolate back up the stack the underlying contiguous read request. that, however, does not play well with i/o aggregation. In particular, this test case has all processors reading from the same file and same offsets, so only one processor actually does the I/O

a real fix would have to report how many bytes were placed into memory.

Oct 14 '16 16:10 mpichbot

Originally by robl on 2010-09-01 15:20:41 -0500

Attachment added: end_of_file.c (2.0 KiB) update to fix buffer checking.

Oct 14 '16 16:10 mpichbot

mpicc  -o end_of_file end_of_file.c  && mpirun -n 2  ./end_of_file tmp.txt
0: count was 10; expected 5
0: buffer[5] = 0; expected 99
0: buffer[6] = 0; expected 99
0: buffer[7] = 0; expected 99
0: buffer[8] = 0; expected 99
0: buffer[9] = 0; expected 99
Found 12 errors
1: count was 10; expected 5
1: buffer[5] = 0; expected 99
1: buffer[6] = 0; expected 99
1: buffer[7] = 0; expected 99
1: buffer[8] = 0; expected 99
1: buffer[9] = 0; expected 99

Mar 21 '22 16:03 hzhou

Originally by robl on 2010-08-31 17:01:01 -0500

I looked at this for a while and got close but not quite there.

the easy approach is to have every process report how many bytes actually read: to percolate back up the stack the underlying contiguous read request. that, however, does not play well with i/o aggregation. In particular, this test case has all processors reading from the same file and same offsets, so only one processor actually does the I/O

a real fix would have to report how many bytes were placed into memory.

@roblatham00 Can we check the file size up front and adjust the parameter before the actual read?

Mar 21 '22 16:03 hzhou

getting file size can be really expensive, but MPI_File_open is collective and we already stat the file if no file system prefix is provided. We should be able to collect some information about the file at open time from one processor and broadcast to everyone else.

not every file system supports stat. For non-posix file systems, we should add another member to the function pointer struct. While we are at it, we should get block size too. Stash these values in the ROMIO file descriptor struct.

You can see ROMIO doing a bit of this already in the "Open collective" routine. It helped us shave a few seconds off of metadata intensive workloads (LAMMPS) back in the Blue Gene days. Seems like a good idea to make it more general.

Mar 22 '22 13:03 roblatham00