zarr-python
zarr-python copied to clipboard
Error when using attribute keys with mixed types
Minimal, reproducible code sample, a copy-pastable example if possible
import zarr
z1 = zarr.open('data/example.zarr', mode='w', shape=(10, 10), dtype='i4')
z1.attrs.put({1: "bar"}) # OK
z1.attrs.put({"foo": "bar"}) # OK
z1.attrs.put({1: "bar", "foo": "bar"}) # Error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [14], in <cell line: 1>()
----> 1 z1.attrs.put({1: "bar", "foo": "bar"})
File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/attrs.py:109, in Attributes.put(self, d)
106 def put(self, d):
107 """Overwrite all attributes with the key/value pairs in the provided dictionary
108 `d` in a single operation."""
--> 109 self._write_op(self._put_nosync, d)
File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/attrs.py:73, in Attributes._write_op(self, f, *args, **kwargs)
71 # synchronization
72 if self.synchronizer is None:
---> 73 return f(*args, **kwargs)
74 else:
75 with self.synchronizer[self.key]:
File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/attrs.py:112, in Attributes._put_nosync(self, d)
111 def _put_nosync(self, d):
--> 112 self.store[self.key] = json_dumps(d)
113 if self.cache:
114 self._cached_asdict = d
File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/util.py:38, in json_dumps(o)
36 def json_dumps(o: Any) -> bytes:
37 """Write JSON in a consistent, human-readable way."""
---> 38 return json.dumps(o, indent=4, sort_keys=True, ensure_ascii=True,
39 separators=(',', ': ')).encode('ascii')
File ~/mambaforge/envs/xarray/lib/python3.9/json/__init__.py:234, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
232 if cls is None:
233 cls = JSONEncoder
--> 234 return cls(
235 skipkeys=skipkeys, ensure_ascii=ensure_ascii,
236 check_circular=check_circular, allow_nan=allow_nan, indent=indent,
237 separators=separators, default=default, sort_keys=sort_keys,
238 **kw).encode(obj)
File ~/mambaforge/envs/xarray/lib/python3.9/json/encoder.py:201, in JSONEncoder.encode(self, o)
199 chunks = self.iterencode(o, _one_shot=True)
200 if not isinstance(chunks, (list, tuple)):
--> 201 chunks = list(chunks)
202 return ''.join(chunks)
File ~/mambaforge/envs/xarray/lib/python3.9/json/encoder.py:431, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
429 yield from _iterencode_list(o, _current_indent_level)
430 elif isinstance(o, dict):
--> 431 yield from _iterencode_dict(o, _current_indent_level)
432 else:
433 if markers is not None:
File ~/mambaforge/envs/xarray/lib/python3.9/json/encoder.py:353, in _make_iterencode.<locals>._iterencode_dict(dct, _current_indent_level)
351 first = True
352 if _sort_keys:
--> 353 items = sorted(dct.items())
354 else:
355 items = dct.items()
TypeError: '<' not supported between instances of 'str' and 'int'
Problem description
I'm not able to assign attributes using keys with different types. Looks like the issue is coming from json that is trying to sort the keys. I guess there are 3 options:
- Only strings are supported: Raise an error if unsupported types are used?
- Mixed types are not supported: Raise an error if mixed types are used?
- It's a bug: Do not sort the attributes by default?
Version and installation information
- Value of
zarr.__version__: 2.11.3 - Value of
numcodecs.__version__: 0.9.1 - Version of Python interpreter: 3.9.12
- Operating system (Linux/Windows/Mac): Mac
- How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): conda
conda env export
name: xarray channels:
- conda-forge dependencies:
- affine=2.3.1=pyhd8ed1ab_0
- aiobotocore=2.3.2=pyhd8ed1ab_0
- aiohttp=3.8.1=py39h63b48b0_1
- aioitertools=0.10.0=pyhd8ed1ab_0
- aiosignal=1.2.0=pyhd8ed1ab_0
- antlr-python-runtime=4.7.2=py39h6e9494a_1003
- appdirs=1.4.4=pyh9f0ad1d_0
- appnope=0.1.3=pyhd8ed1ab_0
- asciitree=0.3.3=py_2
- asttokens=2.0.5=pyhd8ed1ab_0
- async-timeout=4.0.2=pyhd8ed1ab_0
- attrs=21.4.0=pyhd8ed1ab_0
- backcall=0.2.0=pyh9f0ad1d_0
- backports=1.0=py_2
- backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
- blosc=1.21.1=h97e831e_3
- boost-cpp=1.74.0=hdbf7018_7
- boto3=1.21.21=pyhd8ed1ab_0
- botocore=1.24.21=pyhd8ed1ab_1
- bottleneck=1.3.4=py39h86b5767_1
- brotli=1.0.9=h5eb16cf_7
- brotli-bin=1.0.9=h5eb16cf_7
- brotlipy=0.7.0=py39h63b48b0_1004
- bzip2=1.0.8=h0d85af4_4
- c-ares=1.18.1=h0d85af4_0
- ca-certificates=2022.5.18.1=h033912b_0
- cached-property=1.5.2=hd8ed1ab_1
- cached_property=1.5.2=pyha770c72_1
- cairo=1.16.0=h9e0e54b_1010
- cartopy=0.20.2=py39hd74d15e_4
- cdat_info=8.2.1=pyhd8ed1ab_2
- cdms2=3.1.5=py39ha4b5fb4_14
- cdtime=3.1.4=py39hef3cfc1_7
- certifi=2022.5.18.1=py39h6e9494a_0
- cf-units=3.0.1=py39hc89836e_2
- cffi=1.15.0=py39he338e87_0
- cfgrib=0.9.10.1=pyhd8ed1ab_0
- cfgv=3.3.1=pyhd8ed1ab_0
- cfitsio=4.1.0=h2c97ad1_0
- cftime=1.6.0=py39h86b5767_1
- charset-normalizer=2.0.12=pyhd8ed1ab_0
- click=8.1.3=py39h6e9494a_0
- click-plugins=1.1.1=py_0
- cligj=0.7.2=pyhd8ed1ab_1
- cloudpickle=2.1.0=pyhd8ed1ab_0
- coverage=6.4=py39h701faf5_0
- cryptography=37.0.2=py39h9c2a9ce_0
- curl=7.83.1=h372c54d_0
- cycler=0.11.0=pyhd8ed1ab_0
- cytoolz=0.11.2=py39h63b48b0_2
- dask-core=2022.5.1=pyhd8ed1ab_0
- decorator=5.1.1=pyhd8ed1ab_0
- distarray=2.12.2=pyhd8ed1ab_2
- distlib=0.3.4=pyhd8ed1ab_0
- distributed=2022.5.1=pyhd8ed1ab_0
- eccodes=2.26.0=hf772d56_0
- esmf=8.2.0=nompi_h92c19ad_0
- esmpy=8.2.0=nompi_py39h8ffa011_1
- execnet=1.9.0=pyhd8ed1ab_0
- executing=0.8.3=pyhd8ed1ab_0
- expat=2.4.8=h96cf925_0
- fasteners=0.17.3=pyhd8ed1ab_0
- filelock=3.7.0=pyhd8ed1ab_0
- findlibs=0.0.2=pyhd8ed1ab_0
- flox=0.5.3=pyhd8ed1ab_0
- font-ttf-dejavu-sans-mono=2.37=hab24e00_0
- font-ttf-inconsolata=3.000=h77eed37_0
- font-ttf-source-code-pro=2.038=h77eed37_0
- font-ttf-ubuntu=0.83=hab24e00_0
- fontconfig=2.14.0=h676cef8_0
- fonts-conda-ecosystem=1=0
- fonts-conda-forge=1=0
- fonttools=4.33.3=py39h701faf5_0
- freetype=2.10.4=h4cff582_1
- freexl=1.0.6=h0d85af4_0
- frozenlist=1.3.0=py39h63b48b0_1
- fsspec=2022.5.0=pyhd8ed1ab_0
- future=0.18.2=py39h6e9494a_5
- g2clib=1.6.3=h7284f3b_1
- geos=3.10.2=he49afe7_0
- geotiff=1.7.1=had63758_1
- gettext=0.19.8.1=hd1a6beb_1008
- giflib=5.2.1=hbcb3906_2
- h5netcdf=1.0.0=pyhd8ed1ab_0
- h5py=3.6.0=nompi_py39hbc6cb89_100
- hdf4=4.2.15=hefd3b78_3
- hdf5=1.12.1=nompi_ha60fbc9_104
- heapdict=1.0.1=py_0
- hypothesis=6.46.8=pyhd8ed1ab_0
- icu=69.1=he49afe7_0
- identify=2.5.1=pyhd8ed1ab_0
- idna=3.3=pyhd8ed1ab_0
- importlib-metadata=4.11.4=py39h6e9494a_0
- importlib_metadata=4.11.4=hd8ed1ab_0
- importlib_resources=5.7.1=pyhd8ed1ab_1
- iniconfig=1.1.1=pyh9f0ad1d_0
- ipython=8.3.0=py39h6e9494a_0
- iris=3.2.1=pyhd8ed1ab_0
- jasper=2.0.33=h013e400_0
- jedi=0.18.1=py39h6e9494a_1
- jinja2=3.1.2=pyhd8ed1ab_0
- jmespath=1.0.0=pyhd8ed1ab_0
- jpeg=9e=h5eb16cf_1
- json-c=0.16=h01d06f9_0
- jsonschema=4.5.1=pyhd8ed1ab_0
- jupyter_core=4.10.0=py39h6e9494a_0
- kealib=1.4.14=ha22a8b1_3
- kiwisolver=1.4.2=py39h7248d28_1
- krb5=1.19.3=hb49756b_0
- lazy-object-proxy=1.7.1=py39h63b48b0_1
- lcms2=2.12=h577c468_0
- lerc=3.0=he49afe7_0
- libaec=1.0.6=he49afe7_0
- libblas=3.9.0=14_osx64_openblas
- libbrotlicommon=1.0.9=h5eb16cf_7
- libbrotlidec=1.0.9=h5eb16cf_7
- libbrotlienc=1.0.9=h5eb16cf_7
- libcblas=3.9.0=14_osx64_openblas
- libcdms=3.1.2=h7d8e03a_117
- libcf=1.0.3=py39h8d2e2c5_114
- libcurl=7.83.1=h372c54d_0
- libcxx=14.0.3=hc203e6f_0
- libdap4=3.20.6=h3e144a0_2
- libdeflate=1.10=h0d85af4_0
- libdrs=3.1.2=h102a9a4_118
- libdrs_f=3.1.2=h2534672_114
- libedit=3.1.20191231=h0678c8f_2
- libev=4.33=haf1e3a3_1
- libffi=3.4.2=h0d85af4_5
- libgdal=3.4.3=h6474519_0
- libgfortran=5.0.0=9_3_0_h6c81a4c_23
- libgfortran5=9.3.0=h6c81a4c_23
- libglib=2.70.2=hf1fb8c0_4
- libiconv=1.16=haf1e3a3_0
- libkml=1.3.0=h8fd9edb_1014
- liblapack=3.9.0=14_osx64_openblas
- libllvm10=10.0.1=h009f743_3
- libnetcdf=4.8.1=nompi_h6609ca0_102
- libnghttp2=1.47.0=h942079c_0
- libopenblas=0.3.20=openmp_hb3cd9ec_0
- libpng=1.6.37=h7cec526_2
- libpq=14.2=hea3049e_0
- librttopo=1.1.0=hec60dd8_9
- libspatialite=5.0.1=hadde3e2_15
- libssh2=1.10.0=h52ee1ee_2
- libtiff=4.3.0=hfca7e8f_4
- libuuid=2.32.1=h35c211d_1000
- libwebp=1.2.2=h28dabe5_0
- libwebp-base=1.2.2=h0d85af4_1
- libxcb=1.13=h0d85af4_1004
- libxml2=2.9.12=h7e28ab6_1
- libxslt=1.1.33=h1acebb3_3
- libzip=1.8.0=h8b0c345_1
- libzlib=1.2.11=h6c3fc93_1014
- llvm-openmp=14.0.3=ha654fa7_0
- llvmlite=0.36.0=py39h798a4f4_0
- locket=1.0.0=pyhd8ed1ab_0
- lxml=4.8.0=py39h63b48b0_2
- lz4-c=1.9.3=he49afe7_1
- markupsafe=2.1.1=py39h63b48b0_1
- matplotlib-base=3.5.2=py39h64a0072_0
- matplotlib-inline=0.1.3=pyhd8ed1ab_0
- msgpack-python=1.0.3=py39h7248d28_1
- multidict=6.0.2=py39h63b48b0_1
- munkres=1.1.4=pyh9f0ad1d_0
- nbformat=5.4.0=pyhd8ed1ab_0
- nc-time-axis=1.4.1=pyhd8ed1ab_0
- ncurses=6.3=h96cf925_1
- netcdf-fortran=4.5.4=nompi_h9ed14b0_100
- netcdf4=1.5.8=nompi_py39he7d1c46_101
- nodeenv=1.6.0=pyhd8ed1ab_0
- nspr=4.32=hcd9eead_1
- nss=3.78=ha8197d3_0
- numba=0.53.1=py39h32e38f5_1
- numcodecs=0.9.1=py39h9fcab8e_2
- numexpr=2.8.0=py39hbd61c47_2
- numpy=1.22.4=py39h677350a_0
- numpy_groupies=0.9.16=pyhd8ed1ab_0
- openblas=0.3.20=openmp_h5ad848b_0
- openjpeg=2.4.0=h6e7aa92_1
- openssl=1.1.1o=hfe4f2af_0
- packaging=21.3=pyhd8ed1ab_0
- pandas=1.4.2=py39hbd61c47_1
- parso=0.8.3=pyhd8ed1ab_0
- partd=1.2.0=pyhd8ed1ab_0
- patsy=0.5.2=pyhd8ed1ab_0
- pcre=8.45=he49afe7_0
- pexpect=4.8.0=pyh9f0ad1d_2
- pickleshare=0.7.5=py_1003
- pillow=9.1.1=py39h579eac4_0
- pint=0.19.2=pyhd8ed1ab_0
- pip=22.1.1=pyhd8ed1ab_0
- pixman=0.40.0=hbcb3906_0
- platformdirs=2.5.1=pyhd8ed1ab_0
- pluggy=1.0.0=py39h6e9494a_3
- pooch=1.6.0=pyhd8ed1ab_0
- poppler=22.04.0=h39497b0_0
- poppler-data=0.4.11=hd8ed1ab_0
- postgresql=14.2=he8fe76e_0
- pre-commit=2.19.0=py39h6e9494a_0
- proj=9.0.0=h2364a93_1
- prompt-toolkit=3.0.29=pyha770c72_0
- pseudonetcdf=3.2.2=pyhd8ed1ab_0
- psutil=5.9.1=py39h701faf5_0
- pthread-stubs=0.4=hc929b4f_1001
- ptyprocess=0.7.0=pyhd3deb0d_0
- pure_eval=0.2.2=pyhd8ed1ab_0
- py=1.11.0=pyh6c4a22f_0
- pycparser=2.21=pyhd8ed1ab_0
- pygments=2.12.0=pyhd8ed1ab_0
- pyopenssl=22.0.0=pyhd8ed1ab_0
- pyparsing=3.0.9=pyhd8ed1ab_0
- pyproj=3.3.1=py39hfe197cf_0
- pyrsistent=0.18.1=py39h63b48b0_1
- pyshp=2.3.0=pyhd8ed1ab_0
- pysocks=1.7.1=py39h6e9494a_5
- pytest=7.1.2=py39h6e9494a_0
- pytest-cov=3.0.0=pyhd8ed1ab_0
- pytest-env=0.6.2=py_0
- pytest-forked=1.4.0=pyhd8ed1ab_0
- pytest-xdist=2.5.0=pyhd8ed1ab_0
- python=3.9.12=h8b4d769_1_cpython
- python-dateutil=2.8.2=pyhd8ed1ab_0
- python-eccodes=1.4.0=py39h86b5767_1
- python-fastjsonschema=2.15.3=pyhd8ed1ab_0
- python-xxhash=3.0.0=py39h63b48b0_1
- python_abi=3.9=2_cp39
- pytz=2022.1=pyhd8ed1ab_0
- pyyaml=6.0=py39h63b48b0_4
- rasterio=1.2.10=py39hc9cfaa8_5
- readline=8.1=h05e3726_0
- requests=2.27.1=pyhd8ed1ab_0
- s3transfer=0.5.2=pyhd8ed1ab_0
- scipy=1.8.1=py39hfa1a3ab_0
- seaborn=0.11.2=hd8ed1ab_0
- seaborn-base=0.11.2=pyhd8ed1ab_0
- setuptools=62.3.2=py39h6e9494a_0
- shapely=1.8.2=py39hd471b09_1
- six=1.16.0=pyh6c4a22f_0
- snappy=1.1.9=h6e38e02_1
- snuggs=1.4.7=py_0
- sortedcontainers=2.4.0=pyhd8ed1ab_0
- sparse=0.13.0=pyhd8ed1ab_0
- sqlite=3.38.5=hd9f0692_0
- stack_data=0.2.0=pyhd8ed1ab_0
- statsmodels=0.13.2=py39hc89836e_0
- tblib=1.7.0=pyhd8ed1ab_0
- tiledb=2.8.3=hf17dc58_1
- tk=8.6.12=h5dbffcc_0
- toml=0.10.2=pyhd8ed1ab_0
- tomli=2.0.1=pyhd8ed1ab_0
- toolz=0.11.2=pyhd8ed1ab_0
- tornado=6.1=py39h63b48b0_3
- traitlets=5.2.1.post0=pyhd8ed1ab_0
- typing-extensions=4.2.0=hd8ed1ab_1
- typing_extensions=4.2.0=pyha770c72_1
- tzcode=2022a=h5eb16cf_0
- tzdata=2022a=h191b570_0
- udunits2=2.2.28=h06ef574_0
- ukkonen=1.0.1=py39h7248d28_2
- unicodedata2=14.0.0=py39h63b48b0_1
- urllib3=1.26.9=pyhd8ed1ab_0
- virtualenv=20.14.1=py39h6e9494a_0
- wcwidth=0.2.5=pyh9f0ad1d_2
- wheel=0.37.1=pyhd8ed1ab_0
- wrapt=1.14.1=py39h701faf5_0
- xerces-c=3.2.3=h6564042_4
- xorg-libxau=1.0.9=h35c211d_0
- xorg-libxdmcp=1.1.3=h35c211d_0
- xxhash=0.8.0=h35c211d_3
- xz=5.2.5=haf1e3a3_1
- yaml=0.2.5=h0d85af4_2
- yarl=1.7.2=py39h63b48b0_2
- zarr=2.11.3=pyhd8ed1ab_0
- zict=2.2.0=pyhd8ed1ab_0
- zipp=3.8.0=pyhd8ed1ab_0
- zlib=1.2.11=h6c3fc93_1014
- zstd=1.5.2=ha9df2e0_1
- pip:
- numbagg==0.2.1 prefix: /Users/mattia/mambaforge/envs/xarray
see:
- https://bugs.python.org/issue25457
- http://json-schema.org/understanding-json-schema/reference/object.html "in JSON all the keys must be strings"
I think I'd tend towards "Only strings are supported" but another sub-option would be to print a warning and automatically stringify.
Agree on only supporting strings, much easier for everyone.
@clbarnes (or anyone else): and thoughts on completely disallowing versus stringifying arguments?
Strong preference for documenting/ annotating that keys must be strings (and that values must be json-serialisable), and not stringifying non-string keys (which would be unpredictable for different types).
JavaScript gets pretty confusing with its string coercion, no need to open ourselves up to that. Python's got a whole lot better to develop in now there's more focus on sane types. Explicitly checking for key string-ness won't check for the JSON-validity the value (which might be a dict with non-string keys), so it's probably best to just leave it all up to the serialiser.
@clbarnes, objection noted. I guess I'm most worried about the impact of the breaking change since at the moment a single update (z.attrs[1] = "foo") gets you:
/tmp $cat test.zarr/.zattrs
{
"1": "foo"
}
Maybe a couple of versions of noisy deprecation warnings, then? Even if it never actually gets patched out until v3.
Thanks all. Rolling @malmans2's #1066 into 2.13 but leaving this open for the deprecation cycle.
Strike that see #1125