zarr-python ZipStore fails to handle scalar string arrays

Minimal, reproducible code sample, a copy-pastable example if possible

import zarr
import numpy as np
name = 'hello'
data = np.array('world', dtype='<U5')
store = zarr.ZipStore('test_store.zip', mode='w')
root = zarr.open(store , mode='w')
zarr_array = root.create_dataset(name, data=data, shape=data.shape, dtype=data.dtype)
zarr_array[...]

# zarr_array = root.create_dataset(name, shape=data.shape, dtype=data.dtype)
# root[name][...] = data
# zarr_array[...]

Problem description

Scalar coordinates are useful as coordinates in xarray and likely other situations. Serializing them in zarr in a zipstore would be cool!.

xref: https://github.com/pydata/xarray/issues/3815

I think this works in the typical directory store.

Version and installation information

Please provide the following:

Value of zarr.__version__: 2.4.0
Value of numcodecs.__version__: 0.6.4
Version of Python interpreter: 3.7
Operating system (Linux/Windows/Mac): linux
How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): conda, conda-forge

Also, if you think it might be relevant, please provide the output from pip freeze or conda env export depending on which was used to install Zarr.

Mar 26 '20 04:03 hmaarrfk

Ah missed this was string related. Sorry about that. On the bright side this may be an easy resolution.

Basically we need an object_codec specified for things that are not bytes-like, which includes strings. There's a good example in this string section.

Mar 26 '20 05:03 jakirkham

Thoughts @hmaarrfk? 🙂

Aug 28 '20 23:08 jakirkham

I may be able to work on this stuff after October.

Thanks for looking into this with me.

Aug 29 '20 03:08 hmaarrfk

honestly, i ligitimitely might have to revisit this now.

For this, why is it not a problem with the standard store?

Shouldn't this be definied higher up, and not specifically related to the ZipStore?

Aug 31 '20 18:08 hmaarrfk

I guess the correct location to put this is in normalize_dtype

diff --git a/zarr/util.py b/zarr/util.py
index 241009c..c432ed3 100644
--- a/zarr/util.py
+++ b/zarr/util.py
@@ -135,6 +135,9 @@ def normalize_chunks(chunks, shape, typesize):
 
 def normalize_dtype(dtype, object_codec):
 
+    # Ensure that all types of numpy unicode strings are treaded as strings
+    if np.issubdtype(np.unicode_, dtype):
+        dtype = str
     # convenience API for object arrays
     if inspect.isclass(dtype):
         dtype = dtype.__name__

Aug 31 '20 18:08 hmaarrfk

Did you try using an object codec as noted here ( https://github.com/zarr-developers/zarr-python/issues/551#issuecomment-604231507 )? That's typically how we recommend handling Python objects (like str).

Aug 31 '20 19:08 jakirkham

unfortunately, it ignores it because dtype != object

Aug 31 '20 19:08 hmaarrfk

Recent work by @abergou may have improved the situation with object codecs.

Sep 22 '21 14:09 joshmoore

Guessing that is referring to PR ( https://github.com/zarr-developers/zarr-python/pull/813 ) in Zarr 2.9.4+

Sep 22 '21 20:09 jakirkham

zarr-python zarr-python copied to clipboard

ZipStore fails to handle scalar string arrays

Minimal, reproducible code sample, a copy-pastable example if possible

Problem description

Version and installation information

zarr-python
zarr-python copied to clipboard