Can't perform independent write when MPI_File_sync is required by ROMIO driver.
Hello. Recently I've been doing mpi job for extensive data processing. However, I'm getting the "Can't perform independent write when MPI_File_sync is required by ROMIO driver." error from the log.
The symptom is like the following:
- works well in the local (master) machine.
- when runs on nodes, it gives the error "Can't perform independent write when MPI_File_sync is required by ROMIO driver.".
- with dxpl_mpio=:collective, it stucks at write.
The local machine is connected directly to a disk, which I will write the hdf5 file, and the remote device was connected to a disk with NFS.
My question is, why does that error appear? Does it appear because it uses NFS? If this is avoidable, then how?
And also, In the article "https://www.hdfgroup.org/2015/08/parallel-io-with-hdf5/" there is H5Sselect_none operation for collective mode. Does HDF5.jl have similar functionality? If so, how can I use it?
Thanks.
Here is my test code.
using HDF5
using MPI
function main()
@assert HDF5.has_parallel()
MPI.Init()
comm = MPI.COMM_WORLD
info = MPI.Info()
ff = h5open("test.h5", "w", comm, info)
MPI.Barrier(comm)
Nproc = MPI.Comm_size(comm)
myrank = MPI.Comm_rank(comm)
M = 10
A = fill(myrank, M, 2) # local data
dims = (M, Nproc*2+1) # dimensions of global data
# Create dataset
@show "Create dataset"
dset = create_dataset(ff, "/data", datatype(eltype(A)), dataspace(dims), chunk=(M, 2), dxpl_mpio=:collective)
@show "After dataset"
# Write local data
dset[:, 2*myrank + 1:2*myrank + 2] = A
@show "After write dataset"
close(ff)
MPI.Finalize()
end
main()
And my result of "MPIPreferences.use_system_binary()".
julia> MPIPreferences.use_system_binary()
┌ Info: MPI implementation identified
│ libmpi = "libmpi"
│ version_string = "MPICH Version: 4.1.2\nMPICH Release date: Wed Jun 7 15:22:45 CDT 2023\nMPICH ABI: 15:1:3\nMPICH Device: ch4:ofi\nMPICH configure: --prefix=/home/---/tools/mpich --with-ucx=/home/---/tools/ucx\nMPICH CC: /home/---/tools/gcc/bin/gcc -O2\nMPICH CXX: /home/hyunwook/tools/gcc/bin/g++ -O2\nMPICH F77: /home/---/tools/gcc/bin/gfortran -O2\nMPICH FC: /home/---/tools/gcc/bin/gfortran -O2\n"
│ imply = "MPICH"
│ version = v"4.1.2"
└ abi = "MPICH"
┌ Info: MPIPreferences unchanged
│ binary = "system"
│ libmpi = "libmpi"
│ abi = "MPICH"
│ pieces = "mpiexec"
│ preloads = Any[]
└ preloads_env_switch = nothing
Run script (for sbatch)
#!/bin/bash
#SBATCH -J hdf5_test
#SBATCH -o stdout_log.o%j
#SBATCH -N 1
#SBATCH -n 32
mpiexec.hydra -np $SLURM_NTASKS julia test.jl
My Env
- Centos7.5
- slurm with hydra
- HDF5 1.14.1
- GCC 13.2.0
- MPICH 4.1.2 (Yes, I built HDF5, GCC, and MPICH from the source)
@simonbyrne might be best equipped to answer the overall question.
there is H5Sselect_none operation for collective mode. Does HDF5.jl have similar functionality? If so, how can I use it?
We don't have a pregenerated binding for H5Sselect_none in HDF5.jl for this yet. Based on auto-generated bindings in LibHDF5.jl you could just invoke the ccall directly.
https://github.com/mkitti/LibHDF5.jl/blob/712b6e306a15de37f748727b37676aca70ea0664/src/LibHDF5.jl#L3816-L3818
julia> import HDF5.API.HDF5_jll: libhdf5
julia> import HDF5.API: herr_t, hid_t
julia> function H5Sselect_none(spaceid)
ccall((:H5Sselect_none, libhdf5), herr_t, (hid_t,), spaceid)
end
H5Sselect_none (generic function with 1 method)
julia> dspace = dataspace((1,1))
HDF5.Dataspace: (1, 1)
julia> H5Sselect_none(dspace)
0
julia> dspace
HDF5.Dataspace: (1, 1) [irregular selection]
It could be that you are still using the HDF5 library linked against the bundled MPI library (i.e. not the system one).
You either need to specify it (currently you need to set JULIA_HDF5_PATH), or use MPItrampoline (which requires building a wrapper around your system MPI library)
If that is not the case, does it work without the chunk option?
- System MPI library(MPICH, that was bulit from source) was used.
- JULIA_HDF5_PATH was set properly
- with or without chunk, in independent io mode, it still gives the same error
I think this error occurs because of NFS (based on this issue: https://forum.hdfgroup.org/t/hang-for-mpi-hdf5-in-parallel-on-an-nfs-system/6541/3) It looks like now collective mode is working. So I'm going for it.