zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Error when using attribute keys with mixed types

Open malmans2 opened this issue 3 years ago • 6 comments
trafficstars

Minimal, reproducible code sample, a copy-pastable example if possible

import zarr
z1 = zarr.open('data/example.zarr', mode='w', shape=(10, 10), dtype='i4')
z1.attrs.put({1: "bar"})  # OK
z1.attrs.put({"foo": "bar"})  # OK
z1.attrs.put({1: "bar", "foo": "bar"})  # Error
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [14], in <cell line: 1>()
----> 1 z1.attrs.put({1: "bar", "foo": "bar"})

File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/attrs.py:109, in Attributes.put(self, d)
    106 def put(self, d):
    107     """Overwrite all attributes with the key/value pairs in the provided dictionary
    108     `d` in a single operation."""
--> 109     self._write_op(self._put_nosync, d)

File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/attrs.py:73, in Attributes._write_op(self, f, *args, **kwargs)
     71 # synchronization
     72 if self.synchronizer is None:
---> 73     return f(*args, **kwargs)
     74 else:
     75     with self.synchronizer[self.key]:

File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/attrs.py:112, in Attributes._put_nosync(self, d)
    111 def _put_nosync(self, d):
--> 112     self.store[self.key] = json_dumps(d)
    113     if self.cache:
    114         self._cached_asdict = d

File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/util.py:38, in json_dumps(o)
     36 def json_dumps(o: Any) -> bytes:
     37     """Write JSON in a consistent, human-readable way."""
---> 38     return json.dumps(o, indent=4, sort_keys=True, ensure_ascii=True,
     39                       separators=(',', ': ')).encode('ascii')

File ~/mambaforge/envs/xarray/lib/python3.9/json/__init__.py:234, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    232 if cls is None:
    233     cls = JSONEncoder
--> 234 return cls(
    235     skipkeys=skipkeys, ensure_ascii=ensure_ascii,
    236     check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    237     separators=separators, default=default, sort_keys=sort_keys,
    238     **kw).encode(obj)

File ~/mambaforge/envs/xarray/lib/python3.9/json/encoder.py:201, in JSONEncoder.encode(self, o)
    199 chunks = self.iterencode(o, _one_shot=True)
    200 if not isinstance(chunks, (list, tuple)):
--> 201     chunks = list(chunks)
    202 return ''.join(chunks)

File ~/mambaforge/envs/xarray/lib/python3.9/json/encoder.py:431, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
    429     yield from _iterencode_list(o, _current_indent_level)
    430 elif isinstance(o, dict):
--> 431     yield from _iterencode_dict(o, _current_indent_level)
    432 else:
    433     if markers is not None:

File ~/mambaforge/envs/xarray/lib/python3.9/json/encoder.py:353, in _make_iterencode.<locals>._iterencode_dict(dct, _current_indent_level)
    351 first = True
    352 if _sort_keys:
--> 353     items = sorted(dct.items())
    354 else:
    355     items = dct.items()

TypeError: '<' not supported between instances of 'str' and 'int'

Problem description

I'm not able to assign attributes using keys with different types. Looks like the issue is coming from json that is trying to sort the keys. I guess there are 3 options:

  • Only strings are supported: Raise an error if unsupported types are used?
  • Mixed types are not supported: Raise an error if mixed types are used?
  • It's a bug: Do not sort the attributes by default?

Version and installation information

  • Value of zarr.__version__: 2.11.3
  • Value of numcodecs.__version__: 0.9.1
  • Version of Python interpreter: 3.9.12
  • Operating system (Linux/Windows/Mac): Mac
  • How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): conda
conda env export

name: xarray channels:

  • conda-forge dependencies:
  • affine=2.3.1=pyhd8ed1ab_0
  • aiobotocore=2.3.2=pyhd8ed1ab_0
  • aiohttp=3.8.1=py39h63b48b0_1
  • aioitertools=0.10.0=pyhd8ed1ab_0
  • aiosignal=1.2.0=pyhd8ed1ab_0
  • antlr-python-runtime=4.7.2=py39h6e9494a_1003
  • appdirs=1.4.4=pyh9f0ad1d_0
  • appnope=0.1.3=pyhd8ed1ab_0
  • asciitree=0.3.3=py_2
  • asttokens=2.0.5=pyhd8ed1ab_0
  • async-timeout=4.0.2=pyhd8ed1ab_0
  • attrs=21.4.0=pyhd8ed1ab_0
  • backcall=0.2.0=pyh9f0ad1d_0
  • backports=1.0=py_2
  • backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  • blosc=1.21.1=h97e831e_3
  • boost-cpp=1.74.0=hdbf7018_7
  • boto3=1.21.21=pyhd8ed1ab_0
  • botocore=1.24.21=pyhd8ed1ab_1
  • bottleneck=1.3.4=py39h86b5767_1
  • brotli=1.0.9=h5eb16cf_7
  • brotli-bin=1.0.9=h5eb16cf_7
  • brotlipy=0.7.0=py39h63b48b0_1004
  • bzip2=1.0.8=h0d85af4_4
  • c-ares=1.18.1=h0d85af4_0
  • ca-certificates=2022.5.18.1=h033912b_0
  • cached-property=1.5.2=hd8ed1ab_1
  • cached_property=1.5.2=pyha770c72_1
  • cairo=1.16.0=h9e0e54b_1010
  • cartopy=0.20.2=py39hd74d15e_4
  • cdat_info=8.2.1=pyhd8ed1ab_2
  • cdms2=3.1.5=py39ha4b5fb4_14
  • cdtime=3.1.4=py39hef3cfc1_7
  • certifi=2022.5.18.1=py39h6e9494a_0
  • cf-units=3.0.1=py39hc89836e_2
  • cffi=1.15.0=py39he338e87_0
  • cfgrib=0.9.10.1=pyhd8ed1ab_0
  • cfgv=3.3.1=pyhd8ed1ab_0
  • cfitsio=4.1.0=h2c97ad1_0
  • cftime=1.6.0=py39h86b5767_1
  • charset-normalizer=2.0.12=pyhd8ed1ab_0
  • click=8.1.3=py39h6e9494a_0
  • click-plugins=1.1.1=py_0
  • cligj=0.7.2=pyhd8ed1ab_1
  • cloudpickle=2.1.0=pyhd8ed1ab_0
  • coverage=6.4=py39h701faf5_0
  • cryptography=37.0.2=py39h9c2a9ce_0
  • curl=7.83.1=h372c54d_0
  • cycler=0.11.0=pyhd8ed1ab_0
  • cytoolz=0.11.2=py39h63b48b0_2
  • dask-core=2022.5.1=pyhd8ed1ab_0
  • decorator=5.1.1=pyhd8ed1ab_0
  • distarray=2.12.2=pyhd8ed1ab_2
  • distlib=0.3.4=pyhd8ed1ab_0
  • distributed=2022.5.1=pyhd8ed1ab_0
  • eccodes=2.26.0=hf772d56_0
  • esmf=8.2.0=nompi_h92c19ad_0
  • esmpy=8.2.0=nompi_py39h8ffa011_1
  • execnet=1.9.0=pyhd8ed1ab_0
  • executing=0.8.3=pyhd8ed1ab_0
  • expat=2.4.8=h96cf925_0
  • fasteners=0.17.3=pyhd8ed1ab_0
  • filelock=3.7.0=pyhd8ed1ab_0
  • findlibs=0.0.2=pyhd8ed1ab_0
  • flox=0.5.3=pyhd8ed1ab_0
  • font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  • font-ttf-inconsolata=3.000=h77eed37_0
  • font-ttf-source-code-pro=2.038=h77eed37_0
  • font-ttf-ubuntu=0.83=hab24e00_0
  • fontconfig=2.14.0=h676cef8_0
  • fonts-conda-ecosystem=1=0
  • fonts-conda-forge=1=0
  • fonttools=4.33.3=py39h701faf5_0
  • freetype=2.10.4=h4cff582_1
  • freexl=1.0.6=h0d85af4_0
  • frozenlist=1.3.0=py39h63b48b0_1
  • fsspec=2022.5.0=pyhd8ed1ab_0
  • future=0.18.2=py39h6e9494a_5
  • g2clib=1.6.3=h7284f3b_1
  • geos=3.10.2=he49afe7_0
  • geotiff=1.7.1=had63758_1
  • gettext=0.19.8.1=hd1a6beb_1008
  • giflib=5.2.1=hbcb3906_2
  • h5netcdf=1.0.0=pyhd8ed1ab_0
  • h5py=3.6.0=nompi_py39hbc6cb89_100
  • hdf4=4.2.15=hefd3b78_3
  • hdf5=1.12.1=nompi_ha60fbc9_104
  • heapdict=1.0.1=py_0
  • hypothesis=6.46.8=pyhd8ed1ab_0
  • icu=69.1=he49afe7_0
  • identify=2.5.1=pyhd8ed1ab_0
  • idna=3.3=pyhd8ed1ab_0
  • importlib-metadata=4.11.4=py39h6e9494a_0
  • importlib_metadata=4.11.4=hd8ed1ab_0
  • importlib_resources=5.7.1=pyhd8ed1ab_1
  • iniconfig=1.1.1=pyh9f0ad1d_0
  • ipython=8.3.0=py39h6e9494a_0
  • iris=3.2.1=pyhd8ed1ab_0
  • jasper=2.0.33=h013e400_0
  • jedi=0.18.1=py39h6e9494a_1
  • jinja2=3.1.2=pyhd8ed1ab_0
  • jmespath=1.0.0=pyhd8ed1ab_0
  • jpeg=9e=h5eb16cf_1
  • json-c=0.16=h01d06f9_0
  • jsonschema=4.5.1=pyhd8ed1ab_0
  • jupyter_core=4.10.0=py39h6e9494a_0
  • kealib=1.4.14=ha22a8b1_3
  • kiwisolver=1.4.2=py39h7248d28_1
  • krb5=1.19.3=hb49756b_0
  • lazy-object-proxy=1.7.1=py39h63b48b0_1
  • lcms2=2.12=h577c468_0
  • lerc=3.0=he49afe7_0
  • libaec=1.0.6=he49afe7_0
  • libblas=3.9.0=14_osx64_openblas
  • libbrotlicommon=1.0.9=h5eb16cf_7
  • libbrotlidec=1.0.9=h5eb16cf_7
  • libbrotlienc=1.0.9=h5eb16cf_7
  • libcblas=3.9.0=14_osx64_openblas
  • libcdms=3.1.2=h7d8e03a_117
  • libcf=1.0.3=py39h8d2e2c5_114
  • libcurl=7.83.1=h372c54d_0
  • libcxx=14.0.3=hc203e6f_0
  • libdap4=3.20.6=h3e144a0_2
  • libdeflate=1.10=h0d85af4_0
  • libdrs=3.1.2=h102a9a4_118
  • libdrs_f=3.1.2=h2534672_114
  • libedit=3.1.20191231=h0678c8f_2
  • libev=4.33=haf1e3a3_1
  • libffi=3.4.2=h0d85af4_5
  • libgdal=3.4.3=h6474519_0
  • libgfortran=5.0.0=9_3_0_h6c81a4c_23
  • libgfortran5=9.3.0=h6c81a4c_23
  • libglib=2.70.2=hf1fb8c0_4
  • libiconv=1.16=haf1e3a3_0
  • libkml=1.3.0=h8fd9edb_1014
  • liblapack=3.9.0=14_osx64_openblas
  • libllvm10=10.0.1=h009f743_3
  • libnetcdf=4.8.1=nompi_h6609ca0_102
  • libnghttp2=1.47.0=h942079c_0
  • libopenblas=0.3.20=openmp_hb3cd9ec_0
  • libpng=1.6.37=h7cec526_2
  • libpq=14.2=hea3049e_0
  • librttopo=1.1.0=hec60dd8_9
  • libspatialite=5.0.1=hadde3e2_15
  • libssh2=1.10.0=h52ee1ee_2
  • libtiff=4.3.0=hfca7e8f_4
  • libuuid=2.32.1=h35c211d_1000
  • libwebp=1.2.2=h28dabe5_0
  • libwebp-base=1.2.2=h0d85af4_1
  • libxcb=1.13=h0d85af4_1004
  • libxml2=2.9.12=h7e28ab6_1
  • libxslt=1.1.33=h1acebb3_3
  • libzip=1.8.0=h8b0c345_1
  • libzlib=1.2.11=h6c3fc93_1014
  • llvm-openmp=14.0.3=ha654fa7_0
  • llvmlite=0.36.0=py39h798a4f4_0
  • locket=1.0.0=pyhd8ed1ab_0
  • lxml=4.8.0=py39h63b48b0_2
  • lz4-c=1.9.3=he49afe7_1
  • markupsafe=2.1.1=py39h63b48b0_1
  • matplotlib-base=3.5.2=py39h64a0072_0
  • matplotlib-inline=0.1.3=pyhd8ed1ab_0
  • msgpack-python=1.0.3=py39h7248d28_1
  • multidict=6.0.2=py39h63b48b0_1
  • munkres=1.1.4=pyh9f0ad1d_0
  • nbformat=5.4.0=pyhd8ed1ab_0
  • nc-time-axis=1.4.1=pyhd8ed1ab_0
  • ncurses=6.3=h96cf925_1
  • netcdf-fortran=4.5.4=nompi_h9ed14b0_100
  • netcdf4=1.5.8=nompi_py39he7d1c46_101
  • nodeenv=1.6.0=pyhd8ed1ab_0
  • nspr=4.32=hcd9eead_1
  • nss=3.78=ha8197d3_0
  • numba=0.53.1=py39h32e38f5_1
  • numcodecs=0.9.1=py39h9fcab8e_2
  • numexpr=2.8.0=py39hbd61c47_2
  • numpy=1.22.4=py39h677350a_0
  • numpy_groupies=0.9.16=pyhd8ed1ab_0
  • openblas=0.3.20=openmp_h5ad848b_0
  • openjpeg=2.4.0=h6e7aa92_1
  • openssl=1.1.1o=hfe4f2af_0
  • packaging=21.3=pyhd8ed1ab_0
  • pandas=1.4.2=py39hbd61c47_1
  • parso=0.8.3=pyhd8ed1ab_0
  • partd=1.2.0=pyhd8ed1ab_0
  • patsy=0.5.2=pyhd8ed1ab_0
  • pcre=8.45=he49afe7_0
  • pexpect=4.8.0=pyh9f0ad1d_2
  • pickleshare=0.7.5=py_1003
  • pillow=9.1.1=py39h579eac4_0
  • pint=0.19.2=pyhd8ed1ab_0
  • pip=22.1.1=pyhd8ed1ab_0
  • pixman=0.40.0=hbcb3906_0
  • platformdirs=2.5.1=pyhd8ed1ab_0
  • pluggy=1.0.0=py39h6e9494a_3
  • pooch=1.6.0=pyhd8ed1ab_0
  • poppler=22.04.0=h39497b0_0
  • poppler-data=0.4.11=hd8ed1ab_0
  • postgresql=14.2=he8fe76e_0
  • pre-commit=2.19.0=py39h6e9494a_0
  • proj=9.0.0=h2364a93_1
  • prompt-toolkit=3.0.29=pyha770c72_0
  • pseudonetcdf=3.2.2=pyhd8ed1ab_0
  • psutil=5.9.1=py39h701faf5_0
  • pthread-stubs=0.4=hc929b4f_1001
  • ptyprocess=0.7.0=pyhd3deb0d_0
  • pure_eval=0.2.2=pyhd8ed1ab_0
  • py=1.11.0=pyh6c4a22f_0
  • pycparser=2.21=pyhd8ed1ab_0
  • pygments=2.12.0=pyhd8ed1ab_0
  • pyopenssl=22.0.0=pyhd8ed1ab_0
  • pyparsing=3.0.9=pyhd8ed1ab_0
  • pyproj=3.3.1=py39hfe197cf_0
  • pyrsistent=0.18.1=py39h63b48b0_1
  • pyshp=2.3.0=pyhd8ed1ab_0
  • pysocks=1.7.1=py39h6e9494a_5
  • pytest=7.1.2=py39h6e9494a_0
  • pytest-cov=3.0.0=pyhd8ed1ab_0
  • pytest-env=0.6.2=py_0
  • pytest-forked=1.4.0=pyhd8ed1ab_0
  • pytest-xdist=2.5.0=pyhd8ed1ab_0
  • python=3.9.12=h8b4d769_1_cpython
  • python-dateutil=2.8.2=pyhd8ed1ab_0
  • python-eccodes=1.4.0=py39h86b5767_1
  • python-fastjsonschema=2.15.3=pyhd8ed1ab_0
  • python-xxhash=3.0.0=py39h63b48b0_1
  • python_abi=3.9=2_cp39
  • pytz=2022.1=pyhd8ed1ab_0
  • pyyaml=6.0=py39h63b48b0_4
  • rasterio=1.2.10=py39hc9cfaa8_5
  • readline=8.1=h05e3726_0
  • requests=2.27.1=pyhd8ed1ab_0
  • s3transfer=0.5.2=pyhd8ed1ab_0
  • scipy=1.8.1=py39hfa1a3ab_0
  • seaborn=0.11.2=hd8ed1ab_0
  • seaborn-base=0.11.2=pyhd8ed1ab_0
  • setuptools=62.3.2=py39h6e9494a_0
  • shapely=1.8.2=py39hd471b09_1
  • six=1.16.0=pyh6c4a22f_0
  • snappy=1.1.9=h6e38e02_1
  • snuggs=1.4.7=py_0
  • sortedcontainers=2.4.0=pyhd8ed1ab_0
  • sparse=0.13.0=pyhd8ed1ab_0
  • sqlite=3.38.5=hd9f0692_0
  • stack_data=0.2.0=pyhd8ed1ab_0
  • statsmodels=0.13.2=py39hc89836e_0
  • tblib=1.7.0=pyhd8ed1ab_0
  • tiledb=2.8.3=hf17dc58_1
  • tk=8.6.12=h5dbffcc_0
  • toml=0.10.2=pyhd8ed1ab_0
  • tomli=2.0.1=pyhd8ed1ab_0
  • toolz=0.11.2=pyhd8ed1ab_0
  • tornado=6.1=py39h63b48b0_3
  • traitlets=5.2.1.post0=pyhd8ed1ab_0
  • typing-extensions=4.2.0=hd8ed1ab_1
  • typing_extensions=4.2.0=pyha770c72_1
  • tzcode=2022a=h5eb16cf_0
  • tzdata=2022a=h191b570_0
  • udunits2=2.2.28=h06ef574_0
  • ukkonen=1.0.1=py39h7248d28_2
  • unicodedata2=14.0.0=py39h63b48b0_1
  • urllib3=1.26.9=pyhd8ed1ab_0
  • virtualenv=20.14.1=py39h6e9494a_0
  • wcwidth=0.2.5=pyh9f0ad1d_2
  • wheel=0.37.1=pyhd8ed1ab_0
  • wrapt=1.14.1=py39h701faf5_0
  • xerces-c=3.2.3=h6564042_4
  • xorg-libxau=1.0.9=h35c211d_0
  • xorg-libxdmcp=1.1.3=h35c211d_0
  • xxhash=0.8.0=h35c211d_3
  • xz=5.2.5=haf1e3a3_1
  • yaml=0.2.5=h0d85af4_2
  • yarl=1.7.2=py39h63b48b0_2
  • zarr=2.11.3=pyhd8ed1ab_0
  • zict=2.2.0=pyhd8ed1ab_0
  • zipp=3.8.0=pyhd8ed1ab_0
  • zlib=1.2.11=h6c3fc93_1014
  • zstd=1.5.2=ha9df2e0_1
  • pip:
    • numbagg==0.2.1 prefix: /Users/mattia/mambaforge/envs/xarray

malmans2 avatar May 25 '22 11:05 malmans2

see:

  • https://bugs.python.org/issue25457
  • http://json-schema.org/understanding-json-schema/reference/object.html "in JSON all the keys must be strings"

I think I'd tend towards "Only strings are supported" but another sub-option would be to print a warning and automatically stringify.

joshmoore avatar Jun 23 '22 08:06 joshmoore

Agree on only supporting strings, much easier for everyone.

clbarnes avatar Jun 29 '22 14:06 clbarnes

@clbarnes (or anyone else): and thoughts on completely disallowing versus stringifying arguments?

joshmoore avatar Jul 04 '22 13:07 joshmoore

Strong preference for documenting/ annotating that keys must be strings (and that values must be json-serialisable), and not stringifying non-string keys (which would be unpredictable for different types).

JavaScript gets pretty confusing with its string coercion, no need to open ourselves up to that. Python's got a whole lot better to develop in now there's more focus on sane types. Explicitly checking for key string-ness won't check for the JSON-validity the value (which might be a dict with non-string keys), so it's probably best to just leave it all up to the serialiser.

clbarnes avatar Jul 04 '22 13:07 clbarnes

@clbarnes, objection noted. I guess I'm most worried about the impact of the breaking change since at the moment a single update (z.attrs[1] = "foo") gets you:

/tmp $cat test.zarr/.zattrs
{
    "1": "foo"
}

joshmoore avatar Jul 11 '22 12:07 joshmoore

Maybe a couple of versions of noisy deprecation warnings, then? Even if it never actually gets patched out until v3.

clbarnes avatar Jul 11 '22 12:07 clbarnes

Thanks all. Rolling @malmans2's #1066 into 2.13 but leaving this open for the deprecation cycle.

joshmoore avatar Sep 08 '22 06:09 joshmoore

Strike that see #1125

joshmoore avatar Sep 08 '22 06:09 joshmoore