tensorstore
tensorstore copied to clipboard
"<i1" and "<u1" unsupported by zarr driver
When trying to write an array of bytes, an error is reported: INVALID_ARGUMENT: Error parsing object member "metadata": Error parsing object member "dtype": Unsupported zarr dtype: "<i1".
When trying to read, the error is slightly different: FAILED_PRECONDITION: Error opening "zarr" driver: Error reading local file "C:/a/cthead1.zarr/image/.zarray": Error parsing object member "dtype": Unsupported zarr dtype: "<u1" [tensorstore_spec='{\"context\":{\"cache_pool\":{},\"data_copy_concurrency\":{},\"file_io_concurrency\":{}},\"driver\":\"zarr\",\"kvstore\":{\"driver\":\"file\",\"path\":\"C:/a/cthead1.zarr/image/\"},\"recheck_cached_data\":false,\"recheck_cached_metadata\":false}']
Currently tensorstore expects those data types to be "|i1" and "|u1" since for a 1-byte type endianness does not matter.
It looks like zarr-python also seems to use "|i1" and "|u1" rather than "<i1" and "<u1". However, it would be reasonable for tensorstore to also support "<i1", ">i1", "<u1", ">u1" as equivalent, if it is helpful for interoperability.
Which zarr implementation is producing those dtypes?
NCZarr produced that file with <u1. And supporting (and ignoring) endianness for bytes is preferable, as it can be treated the same way as other integer types, e.g.:
std::string dtype;
if (SystemIsBigEndian())
{
dtype = ">";
}
else
{
dtype = "<";
}
if (std::numeric_limits<ElementType>::is_integer)
{
if (std::numeric_limits<ElementType>::is_signed)
{
dtype += 'i';
}
else
{
dtype += 'u';
}
}
else
{
dtype += 'f';
}
dtype += std::to_string(sizeof(ElementType));