Jeremy Maitin-Shepard comments

Results 455 comments of


                                            Jeremy Maitin-Shepard

trafficstars

Protocol extensions for awkward arrays

Let me try to explain how I understand your example in the context of using the existing zarr format (without irregular grid support): Suppose we pick a chunk size of...

Clarify status and semantics of object ('O') data type in storage spec

Definitely agreed that it should be dropped for now. It could be added back as a Python-specific python-object data type for use with a pickle codec, but not all uses...

v3: chunk_memory_layout could be specified as an explicit order rather than C or F

Note: This is the representation used by TensorStore: https://google.github.io/tensorstore/schema.html#json-ChunkLayout.inner_order

v3: chunk_memory_layout could be specified as an explicit order rather than C or F

Here is one example: Suppose we are storing volumetric data indexed by x y z. It is natural to order the dimensions [x, y, z], or sometimes [z, y, x]...

v3: chunk_memory_layout could be specified as an explicit order rather than C or F

A better use case for this feature came up this evening: t5x (https://github.com/google-research/t5x) uses tensorstore to store machine learning model checkpoints. A user had modified the model to transpose the...

Negative chunk indexes, offset chunk origin

There has been extensive discussion of this both in #122 and in the Zarr community meeting regarding this issue. This is not planned to be part of the initial zarr...

Support for inf, nan, binary data in attributes

One difficulty with the current zarr-python approach is that it means the "JSON" metadata is not actually spec-compliant JSON and cannot be parsed by the JavaScript `JSON.parse` function or by...

Support for inf, nan, binary data in attributes

For `fill_value` the data type is already known so there isn't an issue there. zarr-python for v2 already uses a different encoding for `fill_value` ---- infinity is encoded as `"Infinity"`...

Support for inf, nan, binary data in attributes

The difference between `fill_value` and user-defined attributes is that for `fill_value`, the data type is specified elsewhere in the metadata and can be used to decode whatever representation is used....

v2: clarify that unicode uses utf-32 encoding

Not sure how many other implementations even support it? A fixed-length sequence of utf-32-enocded code points seems unlikely to be particularly useful as a data type.