[Feature]: Compatibility with Zarr 3
What would you like to see added to HDMF?
I've run a test of Zarr 3 compatibility across packages, and hdmf 3.14.6 currently fails (and 3.14.6 passes with Zarr 2.18.4).
Here are the test failures
=================================== FAILURES ===================================
_______________ TestWriteHDF5withZarrInput.test_roundtrip_basic ________________
cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>, parent = <Closed HDF5 group>
name = 'my_data'
data = array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, ...25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
options = {'dtype': <class 'numpy.ndarray'>, 'io_settings': {}}
@classmethod
def __list_fill__(cls, parent, name, data, options=None):
# define the io settings and data type if necessary
io_settings = {}
dtype = None
if options is not None:
dtype = options.get('dtype')
io_settings = options.get('io_settings')
if not isinstance(dtype, type):
try:
dtype = cls.__resolve_dtype__(dtype, data)
except Exception as exc:
msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
raise Exception(msg) from exc
# define the data shape
if 'shape' in io_settings:
data_shape = io_settings.pop('shape')
elif hasattr(data, 'shape'):
data_shape = data.shape
elif isinstance(dtype, np.dtype) and len(dtype) > 1: # check if compound dtype
data_shape = (len(data),)
else:
data_shape = get_data_shape(data)
# Create the dataset
try:
> dset = parent.create_dataset(name, shape=data_shape, dtype=dtype, **io_settings)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1488:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E TypeError: Object dtype dtype('O') has no native HDF5 equivalent
h5py/h5t.pyx:1742: TypeError
The above exception was the direct cause of the following exception:
self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_roundtrip_basic>
def test_roundtrip_basic(self):
# Setup all the data we need
zarr.save(self.zarr_path, np.arange(50).reshape(5, 10))
zarr_data = zarr.open(self.zarr_path, 'r')
foo1 = Foo(name='foo1',
my_data=zarr_data,
attr1="I am foo1",
attr2=17,
attr3=3.14)
foobucket = FooBucket('bucket1', [foo1])
foofile = FooFile(buckets=[foobucket])
with HDF5IO(self.path, manager=self.manager, mode='w') as io:
> io.write(foofile)
tests/unit/test_io_hdf5_h5tools.py:3630:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:396: in write
super().write(**kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/io.py:99: in write
self.write_builder(f_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:843: in write_builder
self.write_group(self.__file, gbldr, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1030: in write_group
self.write_dataset(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1335: in write_dataset
dset = self.__list_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>, parent = <Closed HDF5 group>
name = 'my_data'
data = array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, ...25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
options = {'dtype': <class 'numpy.ndarray'>, 'io_settings': {}}
@classmethod
def __list_fill__(cls, parent, name, data, options=None):
# define the io settings and data type if necessary
io_settings = {}
dtype = None
if options is not None:
dtype = options.get('dtype')
io_settings = options.get('io_settings')
if not isinstance(dtype, type):
try:
dtype = cls.__resolve_dtype__(dtype, data)
except Exception as exc:
msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
raise Exception(msg) from exc
# define the data shape
if 'shape' in io_settings:
data_shape = io_settings.pop('shape')
elif hasattr(data, 'shape'):
data_shape = data.shape
elif isinstance(dtype, np.dtype) and len(dtype) > 1: # check if compound dtype
data_shape = (len(data),)
else:
data_shape = get_data_shape(data)
# Create the dataset
try:
dset = parent.create_dataset(name, shape=data_shape, dtype=dtype, **io_settings)
except Exception as exc:
msg = "Could not create dataset %s in %s with shape %s, dtype %s, and iosettings %s. %s" % \
(name, parent.name, str(data_shape), str(dtype), str(io_settings), str(exc))
> raise Exception(msg) from exc
E Exception: Could not create dataset my_data in /buckets/bucket1/foo_holder/foo1 with shape (5, 10), dtype <class 'numpy.ndarray'>, and iosettings {}. Object dtype dtype('O') has no native HDF5 equivalent
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1492: Exception
___________ TestWriteHDF5withZarrInput.test_roundtrip_empty_dataset ____________
cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>, parent = <Closed HDF5 group>
name = 'my_data', data = array([], dtype=int64)
options = {'dtype': <class 'numpy.ndarray'>, 'io_settings': {}}
@classmethod
def __list_fill__(cls, parent, name, data, options=None):
# define the io settings and data type if necessary
io_settings = {}
dtype = None
if options is not None:
dtype = options.get('dtype')
io_settings = options.get('io_settings')
if not isinstance(dtype, type):
try:
dtype = cls.__resolve_dtype__(dtype, data)
except Exception as exc:
msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
raise Exception(msg) from exc
# define the data shape
if 'shape' in io_settings:
data_shape = io_settings.pop('shape')
elif hasattr(data, 'shape'):
data_shape = data.shape
elif isinstance(dtype, np.dtype) and len(dtype) > 1: # check if compound dtype
data_shape = (len(data),)
else:
data_shape = get_data_shape(data)
# Create the dataset
try:
> dset = parent.create_dataset(name, shape=data_shape, dtype=dtype, **io_settings)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1488:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E TypeError: Object dtype dtype('O') has no native HDF5 equivalent
h5py/h5t.pyx:1742: TypeError
The above exception was the direct cause of the following exception:
self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_roundtrip_empty_dataset>
def test_roundtrip_empty_dataset(self):
zarr.save(self.zarr_path, np.asarray([]).astype('int64'))
zarr_data = zarr.open(self.zarr_path, 'r')
foo1 = Foo('foo1', zarr_data, "I am foo1", 17, 3.14)
foobucket = FooBucket('bucket1', [foo1])
foofile = FooFile(buckets=[foobucket])
with HDF5IO(self.path, manager=self.manager, mode='w') as io:
> io.write(foofile)
tests/unit/test_io_hdf5_h5tools.py:3645:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:396: in write
super().write(**kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/io.py:99: in write
self.write_builder(f_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:843: in write_builder
self.write_group(self.__file, gbldr, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1030: in write_group
self.write_dataset(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1335: in write_dataset
dset = self.__list_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>, parent = <Closed HDF5 group>
name = 'my_data', data = array([], dtype=int64)
options = {'dtype': <class 'numpy.ndarray'>, 'io_settings': {}}
@classmethod
def __list_fill__(cls, parent, name, data, options=None):
# define the io settings and data type if necessary
io_settings = {}
dtype = None
if options is not None:
dtype = options.get('dtype')
io_settings = options.get('io_settings')
if not isinstance(dtype, type):
try:
dtype = cls.__resolve_dtype__(dtype, data)
except Exception as exc:
msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
raise Exception(msg) from exc
# define the data shape
if 'shape' in io_settings:
data_shape = io_settings.pop('shape')
elif hasattr(data, 'shape'):
data_shape = data.shape
elif isinstance(dtype, np.dtype) and len(dtype) > 1: # check if compound dtype
data_shape = (len(data),)
else:
data_shape = get_data_shape(data)
# Create the dataset
try:
dset = parent.create_dataset(name, shape=data_shape, dtype=dtype, **io_settings)
except Exception as exc:
msg = "Could not create dataset %s in %s with shape %s, dtype %s, and iosettings %s. %s" % \
(name, parent.name, str(data_shape), str(dtype), str(io_settings), str(exc))
> raise Exception(msg) from exc
E Exception: Could not create dataset my_data in /buckets/bucket1/foo_holder/foo1 with shape (0,), dtype <class 'numpy.ndarray'>, and iosettings {}. Object dtype dtype('O') has no native HDF5 equivalent
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1492: Exception
_______ TestWriteHDF5withZarrInput.test_write_zarr_dataset_compress_gzip _______
self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_write_zarr_dataset_compress_gzip>
def test_write_zarr_dataset_compress_gzip(self):
base_data = np.arange(50).reshape(5, 10).astype('float32')
zarr.save(self.zarr_path, base_data)
zarr_data = zarr.open(self.zarr_path, 'r')
> a = H5DataIO(zarr_data,
compression='gzip',
compression_opts=5,
shuffle=True,
fletcher32=True)
tests/unit/test_io_hdf5_h5tools.py:3694:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:667: in func_call
pargs = _check_args(args, kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
args = (<hdmf.backends.hdf5.h5_utils.H5DataIO object at 0x7f271755ef50>, <Array file:///tmp/tmpuf5mlz5w shape=(5, 10) dtype=float32>)
kwargs = {'compression': 'gzip', 'compression_opts': 5, 'fletcher32': True, 'shuffle': True}
def _check_args(args, kwargs):
"""Parse and check arguments to decorated function. Raise warnings and errors as appropriate."""
# this function was separated from func_call() in order to make stepping through lines of code using pdb
# easier
parsed = __parse_args(
loc_val,
args[1:] if is_method else args,
kwargs,
enforce_type=enforce_type,
enforce_shape=enforce_shape,
allow_extra=allow_extra,
allow_positional=allow_positional
)
parse_warnings = parsed.get('future_warnings')
if parse_warnings:
msg = '%s: %s' % (func.__qualname__, ', '.join(parse_warnings))
warnings.warn(msg, category=FutureWarning, stacklevel=3)
for error_type, ExceptionType in (('type_errors', TypeError),
('value_errors', ValueError),
('syntax_errors', SyntaxError)):
parse_err = parsed.get(error_type)
if parse_err:
msg = '%s: %s' % (func.__qualname__, ', '.join(parse_err))
> raise ExceptionType(msg)
E TypeError: H5DataIO.__init__: incorrect type for 'data' (got 'Array', expected 'ndarray, list, tuple, Dataset or Iterable')
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:660: TypeError
__________ TestWriteHDF5withZarrInput.test_write_zarr_float32_dataset __________
cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmpb7gw6onl" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpep7hiok8 shape=(5, 10) dtype=float32>
options = {'dtype': None, 'io_settings': {}}
@classmethod
def __scalar_fill__(cls, parent, name, data, options=None):
dtype = None
io_settings = {}
if options is not None:
dtype = options.get('dtype')
io_settings = options.get('io_settings')
if not isinstance(dtype, type):
try:
dtype = cls.__resolve_dtype__(dtype, data)
except Exception as exc:
msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
raise Exception(msg) from exc
try:
> dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1363:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E TypeError: Object dtype dtype('O') has no native HDF5 equivalent
h5py/h5t.pyx:1742: TypeError
The above exception was the direct cause of the following exception:
self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_write_zarr_float32_dataset>
def test_write_zarr_float32_dataset(self):
base_data = np.arange(50).reshape(5, 10).astype('float32')
zarr.save(self.zarr_path, base_data)
zarr_data = zarr.open(self.zarr_path, 'r')
io = HDF5IO(self.path, mode='a')
f = io._file
> io.write_dataset(f, DatasetBuilder(name='test_dataset', data=zarr_data, attributes={}))
tests/unit/test_io_hdf5_h5tools.py:3671:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1338: in write_dataset
dset = self.__scalar_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmpb7gw6onl" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpep7hiok8 shape=(5, 10) dtype=float32>
options = {'dtype': None, 'io_settings': {}}
@classmethod
def __scalar_fill__(cls, parent, name, data, options=None):
dtype = None
io_settings = {}
if options is not None:
dtype = options.get('dtype')
io_settings = options.get('io_settings')
if not isinstance(dtype, type):
try:
dtype = cls.__resolve_dtype__(dtype, data)
except Exception as exc:
msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
raise Exception(msg) from exc
try:
dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)
except Exception as exc:
msg = "Could not create scalar dataset %s in %s" % (name, parent.name)
> raise Exception(msg) from exc
E Exception: Could not create scalar dataset test_dataset in /
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1366: Exception
___________ TestWriteHDF5withZarrInput.test_write_zarr_int32_dataset ___________
cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmp348qnibi" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpbuod7l8k shape=(5, 10) dtype=int32>
options = {'dtype': None, 'io_settings': {}}
@classmethod
def __scalar_fill__(cls, parent, name, data, options=None):
dtype = None
io_settings = {}
if options is not None:
dtype = options.get('dtype')
io_settings = options.get('io_settings')
if not isinstance(dtype, type):
try:
dtype = cls.__resolve_dtype__(dtype, data)
except Exception as exc:
msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
raise Exception(msg) from exc
try:
> dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1363:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E TypeError: Object dtype dtype('O') has no native HDF5 equivalent
h5py/h5t.pyx:1742: TypeError
The above exception was the direct cause of the following exception:
self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_write_zarr_int32_dataset>
def test_write_zarr_int32_dataset(self):
base_data = np.arange(50).reshape(5, 10).astype('int32')
zarr.save(self.zarr_path, base_data)
zarr_data = zarr.open(self.zarr_path, 'r')
io = HDF5IO(self.path, mode='a')
f = io._file
> io.write_dataset(f, DatasetBuilder(name='test_dataset', data=zarr_data, attributes={}))
tests/unit/test_io_hdf5_h5tools.py:3657:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1338: in write_dataset
dset = self.__scalar_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmp348qnibi" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpbuod7l8k shape=(5, 10) dtype=int32>
options = {'dtype': None, 'io_settings': {}}
@classmethod
def __scalar_fill__(cls, parent, name, data, options=None):
dtype = None
io_settings = {}
if options is not None:
dtype = options.get('dtype')
io_settings = options.get('io_settings')
if not isinstance(dtype, type):
try:
dtype = cls.__resolve_dtype__(dtype, data)
except Exception as exc:
msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
raise Exception(msg) from exc
try:
dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)
except Exception as exc:
msg = "Could not create scalar dataset %s in %s" % (name, parent.name)
> raise Exception(msg) from exc
E Exception: Could not create scalar dataset test_dataset in /
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1366: Exception
__________ TestWriteHDF5withZarrInput.test_write_zarr_string_dataset ___________
cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmphhu9vfqv" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpyvu6t9_n shape=(2,) dtype=StringDType()>
options = {'dtype': None, 'io_settings': {}}
@classmethod
def __scalar_fill__(cls, parent, name, data, options=None):
dtype = None
io_settings = {}
if options is not None:
dtype = options.get('dtype')
io_settings = options.get('io_settings')
if not isinstance(dtype, type):
try:
dtype = cls.__resolve_dtype__(dtype, data)
except Exception as exc:
msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
raise Exception(msg) from exc
try:
> dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1363:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E TypeError: Object dtype dtype('O') has no native HDF5 equivalent
h5py/h5t.pyx:1742: TypeError
The above exception was the direct cause of the following exception:
self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_write_zarr_string_dataset>
def test_write_zarr_string_dataset(self):
base_data = np.array(['string1', 'string2'], dtype=str)
zarr.save(self.zarr_path, base_data)
zarr_data = zarr.open(self.zarr_path, 'r')
io = HDF5IO(self.path, mode='a')
f = io._file
> io.write_dataset(f, DatasetBuilder('test_dataset', zarr_data, attributes={}))
tests/unit/test_io_hdf5_h5tools.py:3685:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1338: in write_dataset
dset = self.__scalar_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmphhu9vfqv" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpyvu6t9_n shape=(2,) dtype=StringDType()>
options = {'dtype': None, 'io_settings': {}}
@classmethod
def __scalar_fill__(cls, parent, name, data, options=None):
dtype = None
io_settings = {}
if options is not None:
dtype = options.get('dtype')
io_settings = options.get('io_settings')
if not isinstance(dtype, type):
try:
dtype = cls.__resolve_dtype__(dtype, data)
except Exception as exc:
msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
raise Exception(msg) from exc
try:
dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)
except Exception as exc:
msg = "Could not create scalar dataset %s in %s" % (name, parent.name)
> raise Exception(msg) from exc
E Exception: Could not create scalar dataset test_dataset in /
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1366: Exception
Strangely, the failures all seem to be in the HDF5 backends, but I'd guess it's really some change in Zarr that can't be saved to HDF5.
What solution would you like?
See https://zarr.readthedocs.io/en/latest/user-guide/v3_migration.html I believe.
Do you have any interest in helping implement the feature?
No.
hdmf does not yet support zarr v3, which introduces a number of breaking changes that affect the hdmf package. The upcoming release of hdmf 4.0.0 will set the upper bound of the optional zarr dependency to <3 until v3 support is added.
See also the related but separate effort to add support for zarr v3 in hdmf-zarr: https://github.com/hdmf-dev/hdmf-zarr/issues/202
is there timeline for having this addressed? since now we are in a pickle as cannot support zarr v3 at dandi level since you would require zarr v2 for this , which is required for pynwb
The current plan is to wait for the ZEP process for extending data types to be updated and discuss with the Zarr community our use cases for structured arrays and variable-length strings. We have been monitoring the relevant issues in zarr on and off. I haven't seen anything new there.
What is the timeline for dandi needing zarr v3 support? Can dandi support zarr v3 for non-nwb data and zarr v2 for nwb data simultaneously?
FYI zarr python 3.1.0, which we are aiming to release this week, will support structured dtypes for zarr v2 and v3 data. Support for variable length strings was already in zarr python 3.x for some time.
Thanks for the update @d-v-b ! We will try it out when zarr python 3.1 is released (it looks like https://github.com/zarr-developers/zarr-python/pull/2874 isn't released yet but is on the way soon)
zarr python 3.1 is out, please give it a try and let me know if anything needs improvement