HDF5.jl Can't perform independent write when MPI_File

Hello. Recently I've been doing mpi job for extensive data processing. However, I'm getting the "Can't perform independent write when MPI_File_sync is required by ROMIO driver." error from the log.

The symptom is like the following:

works well in the local (master) machine.
when runs on nodes, it gives the error "Can't perform independent write when MPI_File_sync is required by ROMIO driver.".
with dxpl_mpio=:collective, it stucks at write.

The local machine is connected directly to a disk, which I will write the hdf5 file, and the remote device was connected to a disk with NFS.

My question is, why does that error appear? Does it appear because it uses NFS? If this is avoidable, then how?

And also, In the article "https://www.hdfgroup.org/2015/08/parallel-io-with-hdf5/" there is H5Sselect_none operation for collective mode. Does HDF5.jl have similar functionality? If so, how can I use it?

Thanks.

Here is my test code.

using HDF5
using MPI

function main()
    @assert HDF5.has_parallel()

    MPI.Init()
    
    comm = MPI.COMM_WORLD
    info = MPI.Info()
    ff = h5open("test.h5", "w", comm, info)
    MPI.Barrier(comm)
    
    Nproc = MPI.Comm_size(comm)
    myrank = MPI.Comm_rank(comm)
    M = 10
    A = fill(myrank, M, 2)  # local data
    dims = (M, Nproc*2+1)    # dimensions of global data
    
    # Create dataset
    @show "Create dataset"
    dset = create_dataset(ff, "/data", datatype(eltype(A)), dataspace(dims), chunk=(M, 2), dxpl_mpio=:collective)
    @show "After dataset"
    
    # Write local data
    dset[:, 2*myrank + 1:2*myrank + 2] = A
    @show "After write dataset"

    close(ff)
    
    MPI.Finalize()
end

main()

And my result of "MPIPreferences.use_system_binary()".

julia> MPIPreferences.use_system_binary()
┌ Info: MPI implementation identified
│   libmpi = "libmpi"
│   version_string = "MPICH Version:      4.1.2\nMPICH Release date: Wed Jun  7 15:22:45 CDT 2023\nMPICH ABI:          15:1:3\nMPICH Device:       ch4:ofi\nMPICH configure:    --prefix=/home/---/tools/mpich --with-ucx=/home/---/tools/ucx\nMPICH CC:           /home/---/tools/gcc/bin/gcc    -O2\nMPICH CXX:          /home/hyunwook/tools/gcc/bin/g++   -O2\nMPICH F77:          /home/---/tools/gcc/bin/gfortran   -O2\nMPICH FC:           /home/---/tools/gcc/bin/gfortran   -O2\n"
│   imply = "MPICH"
│   version = v"4.1.2"
└   abi = "MPICH"
┌ Info: MPIPreferences unchanged
│   binary = "system"
│   libmpi = "libmpi"
│   abi = "MPICH"
│   pieces = "mpiexec"
│   preloads = Any[]
└   preloads_env_switch = nothing

Run script (for sbatch)

#!/bin/bash
#SBATCH -J hdf5_test
#SBATCH -o stdout_log.o%j
#SBATCH -N 1
#SBATCH -n 32

mpiexec.hydra -np $SLURM_NTASKS julia test.jl

My Env

Centos7.5
slurm with hydra
HDF5 1.14.1
GCC 13.2.0
MPICH 4.1.2 (Yes, I built HDF5, GCC, and MPICH from the source)

Aug 05 '23 15:08 nahaharo

@simonbyrne might be best equipped to answer the overall question.

there is H5Sselect_none operation for collective mode. Does HDF5.jl have similar functionality? If so, how can I use it?

We don't have a pregenerated binding for H5Sselect_none in HDF5.jl for this yet. Based on auto-generated bindings in LibHDF5.jl you could just invoke the ccall directly.

https://github.com/mkitti/LibHDF5.jl/blob/712b6e306a15de37f748727b37676aca70ea0664/src/LibHDF5.jl#L3816-L3818

julia> import HDF5.API.HDF5_jll: libhdf5

julia> import HDF5.API: herr_t, hid_t

julia> function H5Sselect_none(spaceid)
           ccall((:H5Sselect_none, libhdf5), herr_t, (hid_t,), spaceid)
       end
H5Sselect_none (generic function with 1 method)

julia> dspace = dataspace((1,1))
HDF5.Dataspace: (1, 1)

julia> H5Sselect_none(dspace)
0

julia> dspace
HDF5.Dataspace: (1, 1) [irregular selection]

Aug 05 '23 16:08 mkitti

It could be that you are still using the HDF5 library linked against the bundled MPI library (i.e. not the system one).

You either need to specify it (currently you need to set JULIA_HDF5_PATH), or use MPItrampoline (which requires building a wrapper around your system MPI library)

Aug 05 '23 19:08 simonbyrne

If that is not the case, does it work without the chunk option?

Aug 06 '23 02:08 simonbyrne

System MPI library(MPICH, that was bulit from source) was used.
JULIA_HDF5_PATH was set properly
with or without chunk, in independent io mode, it still gives the same error

I think this error occurs because of NFS (based on this issue: https://forum.hdfgroup.org/t/hang-for-mpi-hdf5-in-parallel-on-an-nfs-system/6541/3) It looks like now collective mode is working. So I'm going for it.

Aug 06 '23 05:08 nahaharo