samurai
samurai copied to clipboard
[Bug]: Issue with HDF5 using MPI
What happened?
I want to run the linear-convection in MPI.
- I compile using
cmake -DWITH_MPI=ON ..with the unreleased samurai recipe - I compile an run the testcase in MPI using
mpirun -np 2 ./linear-convectionAnd I get the HDF5 following error :
(base) sbstndbs@sbstndbs:~/samurai/build$ mpirun -np 2 ./demos/FiniteVolume/finite-volume-linear-convection --max-level 10 --min-level 10
------------------------- Linear convection -------------------------
------------------------- Linear convection -------------------------
HDF5-DIAG: Error detected in HDF5 (1.14.5) MPI-process 0:
#000: /tmp/sbstndbs/spack-stage/spack-stage-hdf5-1.14.5-j55c5gq5llfw3dkex26siibxjdh6nyxy/spack-src/src/H5F.c line 653 in H5Fcreate(): unable to synchronously create file
major: File accessibility
minor: Unable to create file
#001: /tmp/sbstndbs/spack-stage/spack-stage-hdf5-1.14.5-j55c5gq5llfw3dkex26siibxjdh6nyxy/spack-src/src/H5F.c line 608 in H5F__create_api_common(): unable to create file
major: File accessibility
minor: Unable to open file
#002: /tmp/sbstndbs/spack-stage/spack-stage-hdf5-1.14.5-j55c5gq5llfw3dkex26siibxjdh6nyxy/spack-src/src/H5VLcallback.c line 3445 in H5VL_file_create(): file create failed
major: Virtual Object Layer
minor: Unable to create file
#003: /tmp/sbstndbs/spack-stage/spack-stage-hdf5-1.14.5-j55c5gq5llfw3dkex26siibxjdh6nyxy/spack-src/src/H5VLcallback.c line 3411 in H5VL__file_create(): file create failed
major: Virtual Object Layer
minor: Unable to create file
#004: /tmp/sbstndbs/spack-stage/spack-stage-hdf5-1.14.5-j55c5gq5llfw3dkex26siibxjdh6nyxy/spack-src/src/H5VLnative_file.c line 94 in H5VL__native_file_create(): unable to create file
major: File accessibility
minor: Unable to open file
#005: /tmp/sbstndbs/spack-stage/spack-stage-hdf5-1.14.5-j55c5gq5llfw3dkex26siibxjdh6nyxy/spack-src/src/H5Fint.c line 1963 in H5F_open(): unable to lock the file
major: File accessibility
minor: Unable to lock file
#006: /tmp/sbstndbs/spack-stage/spack-stage-hdf5-1.14.5-j55c5gq5llfw3dkex26siibxjdh6nyxy/spack-src/src/H5FD.c line 2402 in H5FD_lock(): driver lock request failed
major: Virtual File Layer
minor: Unable to lock file
#007: /tmp/sbstndbs/spack-stage/spack-stage-hdf5-1.14.5-j55c5gq5llfw3dkex26siibxjdh6nyxy/spack-src/src/H5FDsec2.c line 956 in H5FD__sec2_lock(): unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
major: Virtual File Layer
minor: Unable to lock file
terminate called after throwing an instance of 'HighFive::FileException'
what(): Failed to create file /home/sbstndbs/samurai/build/linear_convection_2D_restart_ite_0.h5 (Virtual File Layer) Unable to lock file
- iteration 0: t = 0.00, dt = 0.000927734375
As we can see, this issue is only related to the restart file.
This is related to the followinf line : https://github.com/hpc-maths/samurai/blob/a897435d34432257641086c4997b22a2b55631df/demos/FiniteVolume/linear_convection.cpp#L32C13-L32C81 When I coment this line, everything run fine
My hdf5 and highfive package is compiled with spack with the following arguments :
[email protected]~cxx~fortran~hl~ipo~java~map+mpi+shared~subfiling~szip~threadsafe+tools api=default build_system=cmake [email protected]~boost~ipo+mpi build_system=cmake build_type=Release generator=make
Input code
template <class Field>
void save(const fs::path& path, const std::string& filename, const Field& u, const std::string& suffix = "")
{
auto mesh = u.mesh();
auto level_ = samurai::make_field<std::size_t, 1>("level", mesh);
if (!fs::exists(path))
{
fs::create_directory(path);
}
samurai::for_each_cell(mesh,
[&](const auto& cell)
{
level_[cell] = cell.level;
});
samurai::save(path, fmt::format("{}{}", filename, suffix), mesh, u, level_);
// The following line is buggy in MPI
samurai::dump(path, fmt::format("{}_restart{}", filename, suffix), mesh, u);
}
What expected?
I expected the testcase to run fine by default
What is your operating system?
Linux
How did you install our software?
from source
Software version
0.22.0
Relevant log output
Code of Conduct
- [x] I agree to follow this project's Code of Conduct