dash
dash copied to clipboard
Write HDF5 Error for large matrix
Hi All, Hope you are doing well under current virus epidemic situation. I understand that you may have important things to do now. I just posted an error information here in case you can get chance to look into it.
Recently, I tested the function StoreHDF::write and found out that it works on small matrix but has issue for large array on multiple processes (CPU units). Below is the code:
using dash::io::hdf5::hdf5_options;
using dash::io::hdf5::StoreHDF;
#define N1 201
#define N2 15000
int main(int argc, char *argv[])
{
dash::init(&argc, &argv);
dash::Matrix<double, 2> *h5matrix = new dash::Matrix<double, 2>(dash::SizeSpec<2>(N1, N2));
auto myid = dash::myid();
if (!myid)
{
for (int i = 0; i < N1; i++)
{
for (int j = 0; j < N2; j++)
h5matrix->at(i, j) = i + j;
}
}
StoreHDF::write(*h5matrix, "testf.h5", "testg/testd2D");
StoreHDF::read(*h5matrix, "testf.h5", "testg/testd2D");
if (myid == 1)
{
for (int i = 0; i < N1; i++)
{
for (int j = 0; j < N2; j++)
{
double t = h5matrix->at(i, j);
if (t != (i + j))
{
std::cout << "Wrong result \n";
exit(-1);
}
}
}
}
dash::finalize();
return 0;
}
I compiled the code and ran it with 2 process, and it reports below error. Note that, if you change N1 and N2 in the code to small number, e.g, 10 by 10. It works.
>> mpirun -n 2 ./h5-test
HDF5-DIAG: Error detected in HDF5 (1.10.5) MPI-process 1:
#000: H5Dio.c line 322 in H5Dwrite(): could not get a validated dataspace from file_space_id
major: Invalid arguments to routine
minor: Bad value
#001: H5S.c line 254 in H5S_get_validated_dataspace(): selection + offset not within extent
major: Dataspace
minor: Out of range
^C[mpiexec@dbinMac] Sending Ctrl-C to processes as requested
[mpiexec@dbinMac] Press Ctrl-C again to force abort
Hi DASH Community, Just check whether someone can help to check this issue ?
Bests, Bin
I will look into it this week.
The problem is not the total size of the NArray, it is the size of the extension. E.g. 21 x 20 also results in an error. For 21 a the first rank has 11 elements in the first dimension, but the second one only 20. This seems to be a problem. I tried it with the outputstream. In the end it should be the same result.
#define N1 200
#define N2 15000
#define FILENAME "example.hdf5"
int main(int argc, char *argv[])
{
dash::init(&argc, &argv);
dash::Matrix<double, 2> h5matrix(dash::SizeSpec<2>(N1, N2));
auto myid = dash::myid();
if (!myid) {
for (int i = 0; i < N1; i++) {
for (int j = 0; j < N2; j++)
h5matrix.at(i, j) = i + j;
}
}
dash::io::hdf5::OutputStream os(FILENAME);
os << dash::io::hdf5::dataset("group/data") << h5matrix;
dash::barrier();
if(dash::myid() == 0){
std::string syscall = "h5dump ";
auto status = system((syscall + FILENAME).c_str());
}
dash::finalize();
return 0;
}
It is a bug in the TilePattern. When you use the proxy dash::NArray instead of dash::Matrix it should work. dash::Matrix uses the per default the TilePattern while dash::Narray uses a BlockPattern instead. That's the only difference.
little work around until we fixed the pattern
I fixed it, but if you compile dash with enabled assertions you will get an error by using the TilePattern with underfilled blocks. @devreal and @fuchsto: Why does a TilePattern can't have underfilled blocks? What was the reason to forbid it.
I believe that is a longstanding issue that has never been properly implemented. If someone has a patch I would love that...
The solution would be the same as for the BlockedPattern. Only the last Block is underfilled. If that is fine i will open a pull request.
Absolutely, please give it a shot :+1:
fixed with pr #713
@dhinf @devreal
Thanks for working on this issue. Tested the bug-dash-hdf5-pattern branch and it works.
Could you please review the merge and get the code into development branch?
Thanks. Bin
@dhinf @devreal
I recently tested the code on a 1D data and found the StoreHDF::write and StoreHDF::read still can not work.
The test cod code and error information are presented in below.
Could you help to look into this?
Bests, Bin Test code:
#include "libdash.h"
#include
using dash::io::hdf5::hdf5_options; using dash::io::hdf5::StoreHDF;
#define N1 201
int main(int argc, char *argv[]) { dash::Matrix<double, 1> *h5matrix_1d = new dash::Matrix<double, 1>(dash::SizeSpec<1>(N1)); auto myid = dash::myid();
if (!myid)
{
for (int i = 0; i < N1; i++)
{
h5matrix_1d->at(i) = i;
}
}
StoreHDF::write(*h5matrix_1d, "testf-1d.h5", "testg/testd1D");
StoreHDF::read(*h5matrix_1d, "testf-1d.h5", "testg/testd1D");
if (myid == 1)
{
for (int i = 0; i < N1; i++)
{
double t = h5matrix_1d->at(i);
if (t != i)
{
std::cout << "Wrong result \n";
exit(-1);
}
}
}
dash::finalize();
return 0;
}
========== Error Info:
dbin@Bins-MBP dash % ~/work/soft/dash/build/install/bin/dash-mpiCC h5-1d.cpp -o h5-1d In file included from h5-1d.cpp:1: In file included from /Users/dbin/work/soft/dash/build/install//include/libdash.h:71: In file included from /Users/dbin/work/soft/dash/build/install//include/dash/io/HDF5.h:4: /Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/StorageDriver.h:566:26: error: no member named 'underfilled_blocksize' in 'dash::TilePattern<1, dash::ROW_MAJOR, long>' } else if (pattern.underfilled_blocksize(dimensions.back()) == 0) { ~~~~~~~ ^ /Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/StorageDriver.h:600:12: note: in instantiation of function template specialization 'dash::io::hdf5::StoreHDF::_get_hdf_slabs_with_underfilled<dash::TilePattern<1, dash::ROW_MAJOR, long> >' requested here return _get_hdf_slabs_with_underfilled(pattern); ^ /Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/internal/DriverImplZeroCopy.h:32:21: note: in instantiation of function template specialization 'dash::io::hdf5::StoreHDF::_get_hdf_slabs<1, dash::ROW_MAJOR, long>' requested here auto hyperslabs = _get_hdf_slabs(container.pattern()); ^ /Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/StorageDriver.h:762:5: note: in instantiation of function template specialization 'dash::io::hdf5::StoreHDF::_process_dataset_impl_zero_copy<dash::Matrix<double, 1, long, dash::TilePattern<1, dash::ROW_MAJOR, long>, dash::HostSpace> >' requested here _process_dataset_impl_zero_copy(StoreHDF::Mode::WRITE, container, h5dset, ^ /Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/StorageDriver.h:234:5: note: in instantiation of function template specialization 'dash::io::hdf5::StoreHDF::_write_dataset_impl<dash::Matrix<double, 1, long, dash::TilePattern<1, dash::ROW_MAJOR, long>, dash::HostSpace> >' requested here _write_dataset_impl(array, h5dset, internal_type); ^ h5-1d.cpp:21:15: note: in instantiation of function template specialization 'dash::io::hdf5::StoreHDF::write<dash::Matrix<double, 1, long, dash::TilePattern<1, dash::ROW_MAJOR, long>, dash::HostSpace> >' requested here StoreHDF::write(*h5matrix_1d, "testf-1d.h5", "testg/testd1D"); ^ 1 error generated.
@Goon83
i'll look into it. I need to add the missing method inside the pattern. i'll try to do it this week.
best Denis
It is fixed and merged to development. Btw. in your example code dash::init(&argc,&argv) is missing. Please check, if the fix works your environment.