dash icon indicating copy to clipboard operation
dash copied to clipboard

Write HDF5 Error for large matrix

Open Goon83 opened this issue 4 years ago • 14 comments

Hi All, Hope you are doing well under current virus epidemic situation. I understand that you may have important things to do now. I just posted an error information here in case you can get chance to look into it.

Recently, I tested the function StoreHDF::write and found out that it works on small matrix but has issue for large array on multiple processes (CPU units). Below is the code:

using dash::io::hdf5::hdf5_options;
using dash::io::hdf5::StoreHDF;

#define N1 201
#define N2 15000
int main(int argc, char *argv[])
{
	dash::init(&argc, &argv);

	dash::Matrix<double, 2> *h5matrix = new dash::Matrix<double, 2>(dash::SizeSpec<2>(N1, N2));

	auto myid = dash::myid();

	if (!myid)
	{
		for (int i = 0; i < N1; i++)
		{
			for (int j = 0; j < N2; j++)
				h5matrix->at(i, j) = i + j;
		}
	}
	StoreHDF::write(*h5matrix, "testf.h5", "testg/testd2D");
	StoreHDF::read(*h5matrix, "testf.h5", "testg/testd2D");

	if (myid == 1)
	{
		for (int i = 0; i < N1; i++)
		{
			for (int j = 0; j < N2; j++)
			{
				double t = h5matrix->at(i, j);
				if (t != (i + j))
				{
					std::cout << "Wrong result \n";
					exit(-1);
				}
			}
		}
	}
	dash::finalize();

	return 0;
}

I compiled the code and ran it with 2 process, and it reports below error. Note that, if you change N1 and N2 in the code to small number, e.g, 10 by 10. It works.

>> mpirun  -n 2 ./h5-test
HDF5-DIAG: Error detected in HDF5 (1.10.5) MPI-process 1:
  #000: H5Dio.c line 322 in H5Dwrite(): could not get a validated dataspace from file_space_id
    major: Invalid arguments to routine
    minor: Bad value
  #001: H5S.c line 254 in H5S_get_validated_dataspace(): selection + offset not within extent
    major: Dataspace
    minor: Out of range
^C[mpiexec@dbinMac] Sending Ctrl-C to processes as requested
[mpiexec@dbinMac] Press Ctrl-C again to force abort

Goon83 avatar Mar 26 '20 18:03 Goon83

Hi DASH Community, Just check whether someone can help to check this issue ?

Bests, Bin

Goon83 avatar Jun 12 '20 16:06 Goon83

I will look into it this week.

dhinf avatar Jun 17 '20 10:06 dhinf

The problem is not the total size of the NArray, it is the size of the extension. E.g. 21 x 20 also results in an error. For 21 a the first rank has 11 elements in the first dimension, but the second one only 20. This seems to be a problem. I tried it with the outputstream. In the end it should be the same result.


#define N1 200
#define N2 15000

#define FILENAME "example.hdf5"

int main(int argc, char *argv[])
{
  dash::init(&argc, &argv);
  dash::Matrix<double, 2> h5matrix(dash::SizeSpec<2>(N1, N2));
  auto myid = dash::myid();

  if (!myid) {
    for (int i = 0; i < N1; i++) {
	for (int j = 0; j < N2; j++)
	  h5matrix.at(i, j) = i + j;
    }
  }
  dash::io::hdf5::OutputStream os(FILENAME);
  os << dash::io::hdf5::dataset("group/data") << h5matrix;

  dash::barrier();
  if(dash::myid() == 0){
    std::string syscall = "h5dump ";
    auto status = system((syscall + FILENAME).c_str());
  }
  dash::finalize();

 return 0;
}

dhinf avatar Jun 18 '20 10:06 dhinf

It is a bug in the TilePattern. When you use the proxy dash::NArray instead of dash::Matrix it should work. dash::Matrix uses the per default the TilePattern while dash::Narray uses a BlockPattern instead. That's the only difference.

dhinf avatar Jun 19 '20 11:06 dhinf

little work around until we fixed the pattern

dhinf avatar Jun 19 '20 11:06 dhinf

I fixed it, but if you compile dash with enabled assertions you will get an error by using the TilePattern with underfilled blocks. @devreal and @fuchsto: Why does a TilePattern can't have underfilled blocks? What was the reason to forbid it.

dhinf avatar Jun 19 '20 15:06 dhinf

I believe that is a longstanding issue that has never been properly implemented. If someone has a patch I would love that...

devreal avatar Jun 19 '20 16:06 devreal

The solution would be the same as for the BlockedPattern. Only the last Block is underfilled. If that is fine i will open a pull request.

dhinf avatar Jun 22 '20 08:06 dhinf

Absolutely, please give it a shot :+1:

devreal avatar Jun 22 '20 09:06 devreal

fixed with pr #713

dhinf avatar Jun 26 '20 15:06 dhinf

@dhinf @devreal

Thanks for working on this issue. Tested the bug-dash-hdf5-pattern branch and it works.

Could you please review the merge and get the code into development branch?

Thanks. Bin

Goon83 avatar Jan 05 '21 23:01 Goon83

@dhinf @devreal

I recently tested the code on a 1D data and found the StoreHDF::write and StoreHDF::read still can not work. The test cod code and error information are presented in below.
Could you help to look into this?

Bests, Bin Test code:

#include "libdash.h" #include

using dash::io::hdf5::hdf5_options; using dash::io::hdf5::StoreHDF;

#define N1 201

int main(int argc, char *argv[]) { dash::Matrix<double, 1> *h5matrix_1d = new dash::Matrix<double, 1>(dash::SizeSpec<1>(N1)); auto myid = dash::myid();

if (!myid)
{
    for (int i = 0; i < N1; i++)
    {
        h5matrix_1d->at(i) = i;
    }
}
StoreHDF::write(*h5matrix_1d, "testf-1d.h5", "testg/testd1D");
StoreHDF::read(*h5matrix_1d, "testf-1d.h5", "testg/testd1D");

if (myid == 1)
{
    for (int i = 0; i < N1; i++)
    {
        double t = h5matrix_1d->at(i);
        if (t != i)
        {
            std::cout << "Wrong result \n";
            exit(-1);
        }
    }
}

dash::finalize();

return 0;

}

========== Error Info:

dbin@Bins-MBP dash % ~/work/soft/dash/build/install/bin/dash-mpiCC h5-1d.cpp -o h5-1d In file included from h5-1d.cpp:1: In file included from /Users/dbin/work/soft/dash/build/install//include/libdash.h:71: In file included from /Users/dbin/work/soft/dash/build/install//include/dash/io/HDF5.h:4: /Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/StorageDriver.h:566:26: error: no member named 'underfilled_blocksize' in 'dash::TilePattern<1, dash::ROW_MAJOR, long>' } else if (pattern.underfilled_blocksize(dimensions.back()) == 0) { ~~~~~~~ ^ /Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/StorageDriver.h:600:12: note: in instantiation of function template specialization 'dash::io::hdf5::StoreHDF::_get_hdf_slabs_with_underfilled<dash::TilePattern<1, dash::ROW_MAJOR, long> >' requested here return _get_hdf_slabs_with_underfilled(pattern); ^ /Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/internal/DriverImplZeroCopy.h:32:21: note: in instantiation of function template specialization 'dash::io::hdf5::StoreHDF::_get_hdf_slabs<1, dash::ROW_MAJOR, long>' requested here auto hyperslabs = _get_hdf_slabs(container.pattern()); ^ /Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/StorageDriver.h:762:5: note: in instantiation of function template specialization 'dash::io::hdf5::StoreHDF::_process_dataset_impl_zero_copy<dash::Matrix<double, 1, long, dash::TilePattern<1, dash::ROW_MAJOR, long>, dash::HostSpace> >' requested here _process_dataset_impl_zero_copy(StoreHDF::Mode::WRITE, container, h5dset, ^ /Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/StorageDriver.h:234:5: note: in instantiation of function template specialization 'dash::io::hdf5::StoreHDF::_write_dataset_impl<dash::Matrix<double, 1, long, dash::TilePattern<1, dash::ROW_MAJOR, long>, dash::HostSpace> >' requested here _write_dataset_impl(array, h5dset, internal_type); ^ h5-1d.cpp:21:15: note: in instantiation of function template specialization 'dash::io::hdf5::StoreHDF::write<dash::Matrix<double, 1, long, dash::TilePattern<1, dash::ROW_MAJOR, long>, dash::HostSpace> >' requested here StoreHDF::write(*h5matrix_1d, "testf-1d.h5", "testg/testd1D"); ^ 1 error generated.

Goon83 avatar Apr 26 '21 05:04 Goon83

@Goon83

i'll look into it. I need to add the missing method inside the pattern. i'll try to do it this week.

best Denis

dhinf avatar Apr 27 '21 14:04 dhinf

It is fixed and merged to development. Btw. in your example code dash::init(&argc,&argv) is missing. Please check, if the fix works your environment.

dhinf avatar Apr 28 '21 08:04 dhinf