zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

`np.float32(np.nan)` is not a valid attribute value

Open rabernat opened this issue 3 months ago • 2 comments

Zarr version

3.1.3

Numcodecs version

n/a

Python Version

3.12

Operating System

Mac

Installation

pip

Description

Prior to Zarr 3.1.0, it was possible to supply a value of np.float32(np.nan) as an attribute. (However, I believe this was just turned into a string and not round-trippable.

In 3.1.3, this produces Object of type float32 is not JSON serializable

Given that JSON doesn't distinguish between float32 vs float64, I'm not quite sure what is the correct behavior.

Steps to reproduce


import zarr

store = zarr.storage.MemoryStore()
array = zarr.create_array(
    shape=(10,), chunks=(5,), store=store, dtype='f4',
    attributes={"foo": np.float32(np.nan)}
)
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/zarr/core/metadata/v3.py:300, in ArrayV3Metadata.to_buffer_dict(self, prototype)
    296 json_indent = config.get("json_indent")
    297 d = self.to_dict()
    298 return {
    299     ZARR_JSON: prototype.buffer.from_bytes(
--> [300](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/zarr/core/metadata/v3.py:300)         json.dumps(d, allow_nan=True, indent=json_indent).encode()
    301     )
    302 }

File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/__init__.py:238, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    232 if cls is None:
    233     cls = JSONEncoder
    234 return cls(
    235     skipkeys=skipkeys, ensure_ascii=ensure_ascii,
    236     check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    237     separators=separators, default=default, sort_keys=sort_keys,
--> [238](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/__init__.py:238)     **kw).encode(obj)

File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:202, in JSONEncoder.encode(self, o)
    200 chunks = self.iterencode(o, _one_shot=True)
    201 if not isinstance(chunks, (list, tuple)):
--> [202](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:202)     chunks = list(chunks)
    203 return ''.join(chunks)

File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:432, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
    430     yield from _iterencode_list(o, _current_indent_level)
    431 elif isinstance(o, dict):
--> [432](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:432)     yield from _iterencode_dict(o, _current_indent_level)
    433 else:
    434     if markers is not None:

File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:406, in _make_iterencode.<locals>._iterencode_dict(dct, _current_indent_level)
    404         else:
    405             chunks = _iterencode(value, _current_indent_level)
--> [406](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:406)         yield from chunks
    407 if newline_indent is not None:
    408     _current_indent_level -= 1

File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:406, in _make_iterencode.<locals>._iterencode_dict(dct, _current_indent_level)
    404         else:
    405             chunks = _iterencode(value, _current_indent_level)
--> [406](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:406)         yield from chunks
    407 if newline_indent is not None:
    408     _current_indent_level -= 1

File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:439, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
    437         raise ValueError("Circular reference detected")
    438     markers[markerid] = o
--> [439](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:439) o = _default(o)
    440 yield from _iterencode(o, _current_indent_level)
    441 if markers is not None:

File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:180, in JSONEncoder.default(self, o)
    161 def default(self, o):
    162     """Implement this method in a subclass such that it returns
    163     a serializable object for ``o``, or calls the base implementation
    164     (to raise a ``TypeError``).
   (...)    178 
    179     """
--> [180](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:180)     raise TypeError(f'Object of type {o.__class__.__name__} '
    181                     f'is not JSON serializable')

Additional output

No response

rabernat avatar Sep 22 '25 14:09 rabernat

Given that JSON doesn't distinguish between float32 vs float64, I'm not quite sure what is the correct behavior.

My preference is for attributes encoding / decoding to be lossless, and I think 99% of the time people are best off handling JSON serialization by choosing a JSON serializable type for their data, and parsing the JSON accordingly.

But we could optionally expose an API for people to provide their own attributes encoder / decoder that could contain logic for numpy scalars. There would be no guarantee that their data would be interpreted correctly by other Zarr implementations, or even other Zarr python users, so we should consider this pretty carefully.

d-v-b avatar Sep 22 '25 14:09 d-v-b

the specific case of encoding a numerical scalar in JSON is probably important enough to warrant some support. I think we could build on the idea to re-use the JSON encoding we use for fill values to implement a convention for encoding / decoding scalars in JSON. We could also take this further and support encoding n-dimensionl arrays in attributes.

d-v-b avatar Sep 22 '25 14:09 d-v-b