`np.float32(np.nan)` is not a valid attribute value
Zarr version
3.1.3
Numcodecs version
n/a
Python Version
3.12
Operating System
Mac
Installation
pip
Description
Prior to Zarr 3.1.0, it was possible to supply a value of np.float32(np.nan) as an attribute. (However, I believe this was just turned into a string and not round-trippable.
In 3.1.3, this produces Object of type float32 is not JSON serializable
Given that JSON doesn't distinguish between float32 vs float64, I'm not quite sure what is the correct behavior.
Steps to reproduce
import zarr
store = zarr.storage.MemoryStore()
array = zarr.create_array(
shape=(10,), chunks=(5,), store=store, dtype='f4',
attributes={"foo": np.float32(np.nan)}
)
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/zarr/core/metadata/v3.py:300, in ArrayV3Metadata.to_buffer_dict(self, prototype)
296 json_indent = config.get("json_indent")
297 d = self.to_dict()
298 return {
299 ZARR_JSON: prototype.buffer.from_bytes(
--> [300](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/zarr/core/metadata/v3.py:300) json.dumps(d, allow_nan=True, indent=json_indent).encode()
301 )
302 }
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/__init__.py:238, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
232 if cls is None:
233 cls = JSONEncoder
234 return cls(
235 skipkeys=skipkeys, ensure_ascii=ensure_ascii,
236 check_circular=check_circular, allow_nan=allow_nan, indent=indent,
237 separators=separators, default=default, sort_keys=sort_keys,
--> [238](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/__init__.py:238) **kw).encode(obj)
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:202, in JSONEncoder.encode(self, o)
200 chunks = self.iterencode(o, _one_shot=True)
201 if not isinstance(chunks, (list, tuple)):
--> [202](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:202) chunks = list(chunks)
203 return ''.join(chunks)
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:432, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
430 yield from _iterencode_list(o, _current_indent_level)
431 elif isinstance(o, dict):
--> [432](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:432) yield from _iterencode_dict(o, _current_indent_level)
433 else:
434 if markers is not None:
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:406, in _make_iterencode.<locals>._iterencode_dict(dct, _current_indent_level)
404 else:
405 chunks = _iterencode(value, _current_indent_level)
--> [406](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:406) yield from chunks
407 if newline_indent is not None:
408 _current_indent_level -= 1
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:406, in _make_iterencode.<locals>._iterencode_dict(dct, _current_indent_level)
404 else:
405 chunks = _iterencode(value, _current_indent_level)
--> [406](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:406) yield from chunks
407 if newline_indent is not None:
408 _current_indent_level -= 1
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:439, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
437 raise ValueError("Circular reference detected")
438 markers[markerid] = o
--> [439](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:439) o = _default(o)
440 yield from _iterencode(o, _current_indent_level)
441 if markers is not None:
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:180, in JSONEncoder.default(self, o)
161 def default(self, o):
162 """Implement this method in a subclass such that it returns
163 a serializable object for ``o``, or calls the base implementation
164 (to raise a ``TypeError``).
(...) 178
179 """
--> [180](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/customers/Sylvera/~/mambaforge/envs/earthmover-demos/lib/python3.12/json/encoder.py:180) raise TypeError(f'Object of type {o.__class__.__name__} '
181 f'is not JSON serializable')
Additional output
No response
Given that JSON doesn't distinguish between float32 vs float64, I'm not quite sure what is the correct behavior.
My preference is for attributes encoding / decoding to be lossless, and I think 99% of the time people are best off handling JSON serialization by choosing a JSON serializable type for their data, and parsing the JSON accordingly.
But we could optionally expose an API for people to provide their own attributes encoder / decoder that could contain logic for numpy scalars. There would be no guarantee that their data would be interpreted correctly by other Zarr implementations, or even other Zarr python users, so we should consider this pretty carefully.
the specific case of encoding a numerical scalar in JSON is probably important enough to warrant some support. I think we could build on the idea to re-use the JSON encoding we use for fill values to implement a convention for encoding / decoding scalars in JSON. We could also take this further and support encoding n-dimensionl arrays in attributes.