h5pyd icon indicating copy to clipboard operation
h5pyd copied to clipboard

iterate over chunks on dataset initialization

Open jananzhu opened this issue 4 years ago • 1 comments

Fixes #88 Hi, we are running into the above issue with the HSDS server returning 413 errors when we try to write HDF5 files that are around ~200MB to HSDS. We've implemented the solution suggested in the issue here and it seems to resolve the issue for us.

jananzhu avatar Jan 11 '21 21:01 jananzhu

create_dataset is failing for scalar datasets after this change, but I think it's uncovering an existing issue with the ChunkIterator for scalar datasets.

Traceback (most recent call last):
  File "test_complex_numbers.py", line 57, in test_complex_attr
    dset = f.create_dataset('x', data=5)
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/h5pyd-0.8.2-py3.7.egg/h5pyd/_hl/group.py", line 338, in create_dataset
    for chunk in it:
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/h5pyd-0.8.2-py3.7.egg/h5pyd/_apps/chunkiter.py", line 111, in __next__
    if self._chunk_index[0] * self._layout[0] >= self._shape[0]:
IndexError: tuple index out of range

The if self._layout == () block in ChunkIterator.__next__ seems like it's intended to catch this case before it reaches the code in the traceback, but the chunk size tuple for a scalar dataset in HSDS currently returns (1,). Perhaps the check could be replaced by if self._shape == ()?

jananzhu avatar Jan 12 '21 03:01 jananzhu