hdmf icon indicating copy to clipboard operation
hdmf copied to clipboard

[Feature]: Compatibility with Zarr 3

Open QuLogic opened this issue 11 months ago • 6 comments

What would you like to see added to HDMF?

I've run a test of Zarr 3 compatibility across packages, and hdmf 3.14.6 currently fails (and 3.14.6 passes with Zarr 2.18.4).

Here are the test failures
=================================== FAILURES ===================================
_______________ TestWriteHDF5withZarrInput.test_roundtrip_basic ________________

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>, parent = <Closed HDF5 group>
name = 'my_data'
data = array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, ...25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
options = {'dtype': <class 'numpy.ndarray'>, 'io_settings': {}}

    @classmethod
    def __list_fill__(cls, parent, name, data, options=None):
        # define the io settings and data type if necessary
        io_settings = {}
        dtype = None
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        # define the data shape
        if 'shape' in io_settings:
            data_shape = io_settings.pop('shape')
        elif hasattr(data, 'shape'):
            data_shape = data.shape
        elif isinstance(dtype, np.dtype) and len(dtype) > 1:  # check if compound dtype
            data_shape = (len(data),)
        else:
            data_shape = get_data_shape(data)
    
        # Create the dataset
        try:
>           dset = parent.create_dataset(name, shape=data_shape, dtype=dtype, **io_settings)

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1488: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
    ???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: Object dtype dtype('O') has no native HDF5 equivalent

h5py/h5t.pyx:1742: TypeError

The above exception was the direct cause of the following exception:

self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_roundtrip_basic>

    def test_roundtrip_basic(self):
        # Setup all the data we need
        zarr.save(self.zarr_path, np.arange(50).reshape(5, 10))
        zarr_data = zarr.open(self.zarr_path, 'r')
        foo1 = Foo(name='foo1',
                   my_data=zarr_data,
                   attr1="I am foo1",
                   attr2=17,
                   attr3=3.14)
        foobucket = FooBucket('bucket1', [foo1])
        foofile = FooFile(buckets=[foobucket])
    
        with HDF5IO(self.path, manager=self.manager, mode='w') as io:
>           io.write(foofile)

tests/unit/test_io_hdf5_h5tools.py:3630: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:396: in write
    super().write(**kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/io.py:99: in write
    self.write_builder(f_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:843: in write_builder
    self.write_group(self.__file, gbldr, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
    self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
    self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
    self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1030: in write_group
    self.write_dataset(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1335: in write_dataset
    dset = self.__list_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>, parent = <Closed HDF5 group>
name = 'my_data'
data = array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, ...25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
options = {'dtype': <class 'numpy.ndarray'>, 'io_settings': {}}

    @classmethod
    def __list_fill__(cls, parent, name, data, options=None):
        # define the io settings and data type if necessary
        io_settings = {}
        dtype = None
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        # define the data shape
        if 'shape' in io_settings:
            data_shape = io_settings.pop('shape')
        elif hasattr(data, 'shape'):
            data_shape = data.shape
        elif isinstance(dtype, np.dtype) and len(dtype) > 1:  # check if compound dtype
            data_shape = (len(data),)
        else:
            data_shape = get_data_shape(data)
    
        # Create the dataset
        try:
            dset = parent.create_dataset(name, shape=data_shape, dtype=dtype, **io_settings)
        except Exception as exc:
            msg = "Could not create dataset %s in %s with shape %s, dtype %s, and iosettings %s. %s" % \
                  (name, parent.name, str(data_shape), str(dtype), str(io_settings), str(exc))
>           raise Exception(msg) from exc
E           Exception: Could not create dataset my_data in /buckets/bucket1/foo_holder/foo1 with shape (5, 10), dtype <class 'numpy.ndarray'>, and iosettings {}. Object dtype dtype('O') has no native HDF5 equivalent

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1492: Exception
___________ TestWriteHDF5withZarrInput.test_roundtrip_empty_dataset ____________

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>, parent = <Closed HDF5 group>
name = 'my_data', data = array([], dtype=int64)
options = {'dtype': <class 'numpy.ndarray'>, 'io_settings': {}}

    @classmethod
    def __list_fill__(cls, parent, name, data, options=None):
        # define the io settings and data type if necessary
        io_settings = {}
        dtype = None
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        # define the data shape
        if 'shape' in io_settings:
            data_shape = io_settings.pop('shape')
        elif hasattr(data, 'shape'):
            data_shape = data.shape
        elif isinstance(dtype, np.dtype) and len(dtype) > 1:  # check if compound dtype
            data_shape = (len(data),)
        else:
            data_shape = get_data_shape(data)
    
        # Create the dataset
        try:
>           dset = parent.create_dataset(name, shape=data_shape, dtype=dtype, **io_settings)

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1488: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
    ???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: Object dtype dtype('O') has no native HDF5 equivalent

h5py/h5t.pyx:1742: TypeError

The above exception was the direct cause of the following exception:

self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_roundtrip_empty_dataset>

    def test_roundtrip_empty_dataset(self):
        zarr.save(self.zarr_path, np.asarray([]).astype('int64'))
        zarr_data = zarr.open(self.zarr_path, 'r')
        foo1 = Foo('foo1', zarr_data, "I am foo1", 17, 3.14)
        foobucket = FooBucket('bucket1', [foo1])
        foofile = FooFile(buckets=[foobucket])
    
        with HDF5IO(self.path, manager=self.manager, mode='w') as io:
>           io.write(foofile)

tests/unit/test_io_hdf5_h5tools.py:3645: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:396: in write
    super().write(**kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/io.py:99: in write
    self.write_builder(f_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:843: in write_builder
    self.write_group(self.__file, gbldr, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
    self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
    self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
    self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1030: in write_group
    self.write_dataset(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1335: in write_dataset
    dset = self.__list_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>, parent = <Closed HDF5 group>
name = 'my_data', data = array([], dtype=int64)
options = {'dtype': <class 'numpy.ndarray'>, 'io_settings': {}}

    @classmethod
    def __list_fill__(cls, parent, name, data, options=None):
        # define the io settings and data type if necessary
        io_settings = {}
        dtype = None
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        # define the data shape
        if 'shape' in io_settings:
            data_shape = io_settings.pop('shape')
        elif hasattr(data, 'shape'):
            data_shape = data.shape
        elif isinstance(dtype, np.dtype) and len(dtype) > 1:  # check if compound dtype
            data_shape = (len(data),)
        else:
            data_shape = get_data_shape(data)
    
        # Create the dataset
        try:
            dset = parent.create_dataset(name, shape=data_shape, dtype=dtype, **io_settings)
        except Exception as exc:
            msg = "Could not create dataset %s in %s with shape %s, dtype %s, and iosettings %s. %s" % \
                  (name, parent.name, str(data_shape), str(dtype), str(io_settings), str(exc))
>           raise Exception(msg) from exc
E           Exception: Could not create dataset my_data in /buckets/bucket1/foo_holder/foo1 with shape (0,), dtype <class 'numpy.ndarray'>, and iosettings {}. Object dtype dtype('O') has no native HDF5 equivalent

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1492: Exception
_______ TestWriteHDF5withZarrInput.test_write_zarr_dataset_compress_gzip _______

self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_write_zarr_dataset_compress_gzip>

    def test_write_zarr_dataset_compress_gzip(self):
        base_data = np.arange(50).reshape(5, 10).astype('float32')
        zarr.save(self.zarr_path, base_data)
        zarr_data = zarr.open(self.zarr_path, 'r')
>       a = H5DataIO(zarr_data,
                     compression='gzip',
                     compression_opts=5,
                     shuffle=True,
                     fletcher32=True)

tests/unit/test_io_hdf5_h5tools.py:3694: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:667: in func_call
    pargs = _check_args(args, kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (<hdmf.backends.hdf5.h5_utils.H5DataIO object at 0x7f271755ef50>, <Array file:///tmp/tmpuf5mlz5w shape=(5, 10) dtype=float32>)
kwargs = {'compression': 'gzip', 'compression_opts': 5, 'fletcher32': True, 'shuffle': True}

    def _check_args(args, kwargs):
        """Parse and check arguments to decorated function. Raise warnings and errors as appropriate."""
        # this function was separated from func_call() in order to make stepping through lines of code using pdb
        # easier
    
        parsed = __parse_args(
            loc_val,
            args[1:] if is_method else args,
            kwargs,
            enforce_type=enforce_type,
            enforce_shape=enforce_shape,
            allow_extra=allow_extra,
            allow_positional=allow_positional
        )
    
        parse_warnings = parsed.get('future_warnings')
        if parse_warnings:
            msg = '%s: %s' % (func.__qualname__, ', '.join(parse_warnings))
            warnings.warn(msg, category=FutureWarning, stacklevel=3)
    
        for error_type, ExceptionType in (('type_errors', TypeError),
                                          ('value_errors', ValueError),
                                          ('syntax_errors', SyntaxError)):
            parse_err = parsed.get(error_type)
            if parse_err:
                msg = '%s: %s' % (func.__qualname__, ', '.join(parse_err))
>               raise ExceptionType(msg)
E               TypeError: H5DataIO.__init__: incorrect type for 'data' (got 'Array', expected 'ndarray, list, tuple, Dataset or Iterable')

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:660: TypeError
__________ TestWriteHDF5withZarrInput.test_write_zarr_float32_dataset __________

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmpb7gw6onl" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpep7hiok8 shape=(5, 10) dtype=float32>
options = {'dtype': None, 'io_settings': {}}

    @classmethod
    def __scalar_fill__(cls, parent, name, data, options=None):
        dtype = None
        io_settings = {}
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        try:
>           dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1363: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
    ???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: Object dtype dtype('O') has no native HDF5 equivalent

h5py/h5t.pyx:1742: TypeError

The above exception was the direct cause of the following exception:

self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_write_zarr_float32_dataset>

    def test_write_zarr_float32_dataset(self):
        base_data = np.arange(50).reshape(5, 10).astype('float32')
        zarr.save(self.zarr_path, base_data)
        zarr_data = zarr.open(self.zarr_path, 'r')
        io = HDF5IO(self.path, mode='a')
        f = io._file
>       io.write_dataset(f, DatasetBuilder(name='test_dataset', data=zarr_data, attributes={}))

tests/unit/test_io_hdf5_h5tools.py:3671: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1338: in write_dataset
    dset = self.__scalar_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmpb7gw6onl" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpep7hiok8 shape=(5, 10) dtype=float32>
options = {'dtype': None, 'io_settings': {}}

    @classmethod
    def __scalar_fill__(cls, parent, name, data, options=None):
        dtype = None
        io_settings = {}
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        try:
            dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)
        except Exception as exc:
            msg = "Could not create scalar dataset %s in %s" % (name, parent.name)
>           raise Exception(msg) from exc
E           Exception: Could not create scalar dataset test_dataset in /

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1366: Exception
___________ TestWriteHDF5withZarrInput.test_write_zarr_int32_dataset ___________

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmp348qnibi" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpbuod7l8k shape=(5, 10) dtype=int32>
options = {'dtype': None, 'io_settings': {}}

    @classmethod
    def __scalar_fill__(cls, parent, name, data, options=None):
        dtype = None
        io_settings = {}
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        try:
>           dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1363: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
    ???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: Object dtype dtype('O') has no native HDF5 equivalent

h5py/h5t.pyx:1742: TypeError

The above exception was the direct cause of the following exception:

self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_write_zarr_int32_dataset>

    def test_write_zarr_int32_dataset(self):
        base_data = np.arange(50).reshape(5, 10).astype('int32')
        zarr.save(self.zarr_path, base_data)
        zarr_data = zarr.open(self.zarr_path, 'r')
        io = HDF5IO(self.path, mode='a')
        f = io._file
>       io.write_dataset(f, DatasetBuilder(name='test_dataset', data=zarr_data, attributes={}))

tests/unit/test_io_hdf5_h5tools.py:3657: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1338: in write_dataset
    dset = self.__scalar_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmp348qnibi" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpbuod7l8k shape=(5, 10) dtype=int32>
options = {'dtype': None, 'io_settings': {}}

    @classmethod
    def __scalar_fill__(cls, parent, name, data, options=None):
        dtype = None
        io_settings = {}
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        try:
            dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)
        except Exception as exc:
            msg = "Could not create scalar dataset %s in %s" % (name, parent.name)
>           raise Exception(msg) from exc
E           Exception: Could not create scalar dataset test_dataset in /

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1366: Exception
__________ TestWriteHDF5withZarrInput.test_write_zarr_string_dataset ___________

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmphhu9vfqv" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpyvu6t9_n shape=(2,) dtype=StringDType()>
options = {'dtype': None, 'io_settings': {}}

    @classmethod
    def __scalar_fill__(cls, parent, name, data, options=None):
        dtype = None
        io_settings = {}
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        try:
>           dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1363: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
    ???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: Object dtype dtype('O') has no native HDF5 equivalent

h5py/h5t.pyx:1742: TypeError

The above exception was the direct cause of the following exception:

self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_write_zarr_string_dataset>

    def test_write_zarr_string_dataset(self):
        base_data = np.array(['string1', 'string2'], dtype=str)
        zarr.save(self.zarr_path, base_data)
        zarr_data = zarr.open(self.zarr_path, 'r')
        io = HDF5IO(self.path, mode='a')
        f = io._file
>       io.write_dataset(f, DatasetBuilder('test_dataset', zarr_data, attributes={}))

tests/unit/test_io_hdf5_h5tools.py:3685: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1338: in write_dataset
    dset = self.__scalar_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmphhu9vfqv" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpyvu6t9_n shape=(2,) dtype=StringDType()>
options = {'dtype': None, 'io_settings': {}}

    @classmethod
    def __scalar_fill__(cls, parent, name, data, options=None):
        dtype = None
        io_settings = {}
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        try:
            dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)
        except Exception as exc:
            msg = "Could not create scalar dataset %s in %s" % (name, parent.name)
>           raise Exception(msg) from exc
E           Exception: Could not create scalar dataset test_dataset in /

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1366: Exception

Strangely, the failures all seem to be in the HDF5 backends, but I'd guess it's really some change in Zarr that can't be saved to HDF5.

What solution would you like?

See https://zarr.readthedocs.io/en/latest/user-guide/v3_migration.html I believe.

Do you have any interest in helping implement the feature?

No.

QuLogic avatar Jan 20 '25 00:01 QuLogic

hdmf does not yet support zarr v3, which introduces a number of breaking changes that affect the hdmf package. The upcoming release of hdmf 4.0.0 will set the upper bound of the optional zarr dependency to <3 until v3 support is added.

See also the related but separate effort to add support for zarr v3 in hdmf-zarr: https://github.com/hdmf-dev/hdmf-zarr/issues/202

rly avatar Jan 21 '25 22:01 rly

is there timeline for having this addressed? since now we are in a pickle as cannot support zarr v3 at dandi level since you would require zarr v2 for this , which is required for pynwb

yarikoptic avatar Apr 08 '25 18:04 yarikoptic

The current plan is to wait for the ZEP process for extending data types to be updated and discuss with the Zarr community our use cases for structured arrays and variable-length strings. We have been monitoring the relevant issues in zarr on and off. I haven't seen anything new there.

What is the timeline for dandi needing zarr v3 support? Can dandi support zarr v3 for non-nwb data and zarr v2 for nwb data simultaneously?

rly avatar Apr 08 '25 19:04 rly

FYI zarr python 3.1.0, which we are aiming to release this week, will support structured dtypes for zarr v2 and v3 data. Support for variable length strings was already in zarr python 3.x for some time.

d-v-b avatar Jun 30 '25 15:06 d-v-b

Thanks for the update @d-v-b ! We will try it out when zarr python 3.1 is released (it looks like https://github.com/zarr-developers/zarr-python/pull/2874 isn't released yet but is on the way soon)

rly avatar Jul 08 '25 21:07 rly

zarr python 3.1 is out, please give it a try and let me know if anything needs improvement

d-v-b avatar Jul 15 '25 13:07 d-v-b