zarr-specs icon indicating copy to clipboard operation
zarr-specs copied to clipboard

r* data type should be parametrized via configuration

Open d-v-b opened this issue 8 months ago • 2 comments

Instances of the r* data type are parametrized by a length, so no zarr metadata will contain {..., "data_type": "r*"} but rather {..., "data_type": "r8"} or similar. As a result of this design, the r* data type does not have a fixed name, unlike all the other data types defined in the spec. An alternative specification of the data type could easily result in a fixed name:

{
  "name": "r*",
  "configuration": {
    "length": <length in bits>
  }
}

d-v-b avatar Mar 10 '25 16:03 d-v-b

The current design prevents implementations from creating finite mappings between string names and data types. I imagine the example of the r* data type could also be confusing to people creating data type extensions.

The spec says:

Note

We are explicitly looking for more feedback and prototypes of code using the r*, raw bits, for various endianness and whether the spec could be made clearer.

Does this mean the design of the r* data type is provisional?

d-v-b avatar Mar 11 '25 11:03 d-v-b

I agree that the language here is also not clear, e.g.,

In addition to these base types, an implementation should also handle the raw/opaque pass-through type designated by the lower-case letter r followed by the number of bits, multiple of 8.

doesn't have the "should" here capitalized.

I'm not sure, though, whether I would go so far as using "provisional" (though, I guess we'd need to define that term for ourselves first). My inclination would be to start a deprecation process for them if the dynamic nature of the naming causes issues.

joshmoore avatar Mar 16 '25 15:03 joshmoore