tensorstore icon indicating copy to clipboard operation
tensorstore copied to clipboard

"<i1" and "<u1" unsupported by zarr driver

Open dzenanz opened this issue 3 years ago • 2 comments

When trying to write an array of bytes, an error is reported: INVALID_ARGUMENT: Error parsing object member "metadata": Error parsing object member "dtype": Unsupported zarr dtype: "<i1".

When trying to read, the error is slightly different: FAILED_PRECONDITION: Error opening "zarr" driver: Error reading local file "C:/a/cthead1.zarr/image/.zarray": Error parsing object member "dtype": Unsupported zarr dtype: "<u1" [tensorstore_spec='{\"context\":{\"cache_pool\":{},\"data_copy_concurrency\":{},\"file_io_concurrency\":{}},\"driver\":\"zarr\",\"kvstore\":{\"driver\":\"file\",\"path\":\"C:/a/cthead1.zarr/image/\"},\"recheck_cached_data\":false,\"recheck_cached_metadata\":false}']

dzenanz avatar Oct 28 '22 15:10 dzenanz

Currently tensorstore expects those data types to be "|i1" and "|u1" since for a 1-byte type endianness does not matter.

It looks like zarr-python also seems to use "|i1" and "|u1" rather than "<i1" and "<u1". However, it would be reasonable for tensorstore to also support "<i1", ">i1", "<u1", ">u1" as equivalent, if it is helpful for interoperability.

Which zarr implementation is producing those dtypes?

jbms avatar Oct 28 '22 16:10 jbms

NCZarr produced that file with <u1. And supporting (and ignoring) endianness for bytes is preferable, as it can be treated the same way as other integer types, e.g.:

std::string dtype;
if (SystemIsBigEndian())
{
  dtype = ">";
}
else
{
  dtype = "<";
}
if (std::numeric_limits<ElementType>::is_integer)
{
  if (std::numeric_limits<ElementType>::is_signed)
  {
    dtype += 'i';
  }
  else
  {
    dtype += 'u';
  }
}
else
{
  dtype += 'f';
}
dtype += std::to_string(sizeof(ElementType)); 

dzenanz avatar Oct 28 '22 17:10 dzenanz