tensorstore icon indicating copy to clipboard operation
tensorstore copied to clipboard

Adding C++ example/tutorial to the documentation

Open sameeul opened this issue 2 years ago • 31 comments

Currently, the documentation has Python tutorials. Will it be possible to add some minimal working examples for the C++ API ? Similar to this (https://google.github.io/tensorstore/python/tutorial.html)

sameeul avatar Jul 26 '22 18:07 sameeul

We are working on adding C++ API documentation and examples, but unfortunately haven't had the time to finish that yet. In the meantime you can look here for some usage examples:

https://github.com/google/tensorstore/blob/master/tensorstore/driver/zarr/driver_test.cc

jbms avatar Jul 26 '22 18:07 jbms

Thanks. If I may ask some naive questions based on the example code there... I have the following code sample

int main(int argc, char** argv) {
  tensorstore::Context context = Context::Default();
  TENSORSTORE_CHECK_OK_AND_ASSIGN(auto store, tensorstore::Open({{"driver", "zarr"},
                            {"kvstore", {{"driver", "file"},
                                         {"path", "p01_x01_y01_wx0_wy0_c1.ome.zarr"}}
                            }},
                            context,
                            tensorstore::OpenMode::open,
                            tensorstore::RecheckCached{false},
                            tensorstore::ReadWriteMode::read).result());

  return 0;
}

When I run this, I get the following message: INVALID_ARGUMENT: Error parsing object member "driver": "zarr" is not registered.

It is not very clear to me on how to register a driver.

sameeul avatar Jul 26 '22 20:07 sameeul

Assuming you are using bazel to build, drivers are registered by including the appropriate driver target as a dependency in your build. For zarr and file drivers you need:

//tensorstore/driver/zarr and //tensorstore/kvstore/file

To include all drivers you can instead add as dependencies:

//tensorstore:all_drivers

jbms avatar Jul 26 '22 20:07 jbms

Thanks! Worked like a charm!

sameeul avatar Jul 26 '22 20:07 sameeul

You can also look at some of the C++ examples in:

https://github.com/google/tensorstore/tree/master/tensorstore/examples

laramiel avatar Jul 28 '22 02:07 laramiel

Having a C++ example and/or tutorial would be good. Tests are focused on testing - which would be totally fine if a user example existed.

dzenanz avatar Oct 24 '22 21:10 dzenanz

My attempt at reading a zarr file and examining what I read ran into a road block:

#include "tensorstore/context.h"
#include "tensorstore/open.h"

int
main(int argc, char ** argv)
{
  tensorstore::Context context = tensorstore::Context::Default();

  std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr";
  // std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0";

  auto store = tensorstore::Open({ { "driver", "zarr" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                                 context,
                                 tensorstore::OpenMode::open,
                                 tensorstore::RecheckCached{ false },
                                 tensorstore::ReadWriteMode::read)
                 .result();
  std::cout << store.domain().shape();

  return EXIT_SUCCESS;
}
13>------ Build started: Project: tester, Configuration: Debug x64 ------
13>tester.cpp
13>C:\Misc\Tester\tester.cpp(18,22): error C2039: 'domain': is not a member of 'tensorstore::Result<tensorstore::TensorStore<void,-1,tensorstore::ReadWriteMode::dynamic>>'
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/util/future_impl.h(502): message : see declaration of 'tensorstore::Result<tensorstore::TensorStore<void,-1,tensorstore::ReadWriteMode::dynamic>>'
13>Done building project "tester.vcxproj" -- FAILED.

dzenanz avatar Oct 25 '22 16:10 dzenanz

My attempt was inspired by https://github.com/google/tensorstore/blob/f6da8b5696a04cb6f30fab07183756d0d67d5eaa/tensorstore/driver/zarr/driver_test.cc#L175.

dzenanz avatar Oct 25 '22 16:10 dzenanz

tensorstore::Open returns a Future<TensorStore<>>, calling result() gives you a Result<TensorStore<>> (which holds either a TensorStore<> value (indicating success) or an error absl::Status). If you instead call value() you will get a plain TensorStore<> object:

auto store = tensorstore::Open({{"driver", "zarr"}, {"kvstore", {{"driver", "file"}, {"path", path}}}}, ...).value();

jbms avatar Oct 25 '22 18:10 jbms

I think I have an example which "works", but it does not open example zarr files.

#include "tensorstore/context.h"
#include "tensorstore/open.h"

int
main(int argc, char ** argv)
{
  tensorstore::Context context = tensorstore::Context::Default();

  std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr";
  // std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0";

  auto openFuture =
    tensorstore::Open({ { "driver", "zarr" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                      context,
                      tensorstore::OpenMode::open,
                      tensorstore::RecheckCached{ false },
                      tensorstore::ReadWriteMode::read);

  auto status = openFuture.result().status();
  if (status.ok())
  {
    std::cout << "status OK";
    auto store = openFuture.value();
    std::cout << store.domain().shape();
  }
  else
  {
    std::cout << "status BAD\n" << status;
    return EXIT_FAILURE;
  }

  return EXIT_SUCCESS;
}
status BAD
NOT_FOUND: Error opening "zarr" driver: Metadata at local file "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/.zarray" does not exist [tensorstore_spec='{\"context\":{\"cache_pool\":{},\"data_copy_concurrency\":{},\"file_io_concurrency\":{}},\"driver\":\"zarr\",\"kvstore\":{\"driver\":\"file\",\"path\":\"C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/\"},\"recheck_cached_data\":false,\"recheck_cached_metadata\":false}']
C:\Misc\Tester\Debug\tester.exe (process 16716) exited with code 1.

dzenanz avatar Oct 25 '22 21:10 dzenanz

I think the issue is related to the OMEZarr file structure. OMEZarr actually can contain multiple datasets (each of them as a zarr file). So, The top level file directory actually has a .zattrs file which gives you the dataset name (the actual zarr file). You can parse that JSON file to get a dataset name and then you can append that to your filename and read via tensorstore. For your case, one of these file can be C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0 .

sameeul avatar Oct 25 '22 22:10 sameeul

Sadly, no. If I use C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0 as the path, I get a crash:

C:\Misc\Tester\_deps\tensorstore-src\tensorstore/util/result.h:506: CHECK failed: !has_value()
C:\Misc\Tester\Debug\tester.exe (process 11244) exited with code 3.

dzenanz avatar Oct 26 '22 12:10 dzenanz

Here the error is actually that there is no error: Result::status() currently can only be used if the result is in an error state --- we should fix that, though.

To fix your example:

#include "tensorstore/context.h"
#include "tensorstore/open.h"

int
main(int argc, char ** argv)
{
  tensorstore::Context context = tensorstore::Context::Default();

  std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0";

  auto openFuture =
    tensorstore::Open({ { "driver", "zarr" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                      context,
                      tensorstore::OpenMode::open,
                      tensorstore::RecheckCached{ false },
                      tensorstore::ReadWriteMode::read);

  auto result = openFuture.result();
  if (result.ok())
  {
    std::cout << "status OK";
    auto store = result.value();
    std::cout << store.domain().shape();
  }
  else
  {
    std::cout << "status BAD\n" << result.status();
    return EXIT_FAILURE;
  }

  return EXIT_SUCCESS;
}

jbms avatar Oct 26 '22 17:10 jbms

Thanks, this is already helpful.

Some syntax help geared towards interoperability would be significant. How to get domain's origin and shape into std::vectors? How to get underlying element type (dtype)? How to get entire metadata as nlohmann::json? What is the C++ equivalent of x = dataset_3d[15000:15100, 15000:15100, 20000]? How to get buffer pointer (pointer to first element in memory)? I assume this is possible only for the entire currently buffered chunk, and not its sub-views. Or an equivalent question: how to get elements of an array into an std::vector<dtype>? How to write an n-dimensional array with shape given in std::vector and const dtype * buffer?

Direct answers are preferable, but even pointers to most similar/relevant code currently existing in examples or tests would be good. https://github.com/google/tensorstore/blob/f6da8b5696a04cb6f30fab07183756d0d67d5eaa/tensorstore/driver/zarr/driver_test.cc is 3000 lines long, and seems to be heavily oriented towards testing. Other examples also aren't oriented towards education of interoperability. I am trying to convert these 150 lines from using netCDF's NCZarr to using tensorstore.

dzenanz avatar Oct 26 '22 22:10 dzenanz

Thanks, this is already helpful.

Some syntax help geared towards interoperability would be significant. How to get domain's origin and shape into std::vectors?

auto shape_span = store.domain().shape();
std::vector<int64_t> shape(shape_span.begin(), shape_span.end());
// similar for origin

How to get underlying element type (dtype)?

tensorstore::DataType dtype = store.dtype();
if (dtype == tensorstore::dtype_v<int32_t>) { /* ... */ }

How to get entire metadata as nlohmann::json?

TensorStore doesn't specifically support zarr user-defined metadata, so this relies on its generic "json" driver:

auto attrs_store = tensorstore::Open<::nlohmann::json, 0>({{"driver", "json"}, {"kvstore", {{"driver", "file"}, {"path", ".../.zattrs"}}}}).result().value();

// Sets attrs_array to a rank-0 array of ::nlohmann::json
auto attrs_array_result = tensorstore::Read(attrs_store).result();

::nlohmann::json attrs;
if (attrs_array_result.ok()) {
  attrs = attrs_array_result->value()();
} else if (absl::IsNotFound(attrs_array_result.status()) {
  attrs = ::nlohmann::json::object_t();
} else {
  return attrs_array_result.status();
}

What is the C++ equivalent of x = dataset_3d[15000:15100, 15000:15100, 20000]?

auto x = tensorstore::Read<tensorstore::zero_origin>(dataset_3d | tensorstore::Dims(0, 1).HalfOpenInterval({15000, 15000}, {15100, 15100}) | tensorstore::Dims(2).IndexSlice(20000)).result().value();

How to get buffer pointer (pointer to first element in memory)? > I assume this is possible only for the entire currently buffered chunk, and not its sub-views. Or an equivalent question: how to get elements of an array into an std::vector<dtype>?

Following example above:

void *ptr = x.data();
auto dtype = x.dtype();

You can also read directly into an std::vector by adapting it into a tensorstore::Array:

std::vector<int32_t> vec(100 * 100);
auto arr = tensorstore::Array(vec.data(), {100, 100}, tensorstore::c_order);
tensorstore::Read(dataset_3d | tensorstore::Dims(0, 1).HalfOpenInterval({15000, 15000}, {15100, 15100}) | tensorstore::Dims(2).IndexSlice(20000), tensorstore::UnownedToShared(arr)).value();

How to write an n-dimensional array with shape given in std::vector and const dtype * buffer?

std::vector<int64_t> shape{100, 100};
const int32_t *buffer = ...;
auto arr = tensorstore::Array(buffer, shape, tensorstore::c_order);
tensorstore::Write(tensorstore::UnownedToShared(arr), dataset_3d | tensorstore::Dims(0, 1).HalfOpenInterval({15000, 15000}, {15100, 15100}) | tensorstore::Dims(2).IndexSlice(20000)).value();

jbms avatar Oct 26 '22 23:10 jbms

This is super-helpful, thank you. Updated example:

#include "tensorstore/context.h"
#include "tensorstore/open.h"
#include "tensorstore/index_space/dim_expression.h"

int
main(int argc, char ** argv)
{
  tensorstore::Context context = tensorstore::Context::Default();

  std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0";
  // std::string path = "C:/Dev/ITKIOOMEZarrNGFF/test/zarr_implementations/examples/zarr.zr";

  auto openFuture =
    tensorstore::Open({ { "driver", "zarr" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                      context,
                      tensorstore::OpenMode::open,
                      tensorstore::RecheckCached{ false },
                      tensorstore::ReadWriteMode::read);

  auto result = openFuture.result();
  if (result.ok())
  {
    auto store = result.value();
    auto domain = store.domain();
    std::cout << "domain.shape(): " << domain.shape() << std::endl;
    std::cout << "domain.origin(): " << domain.origin() << std::endl;
    auto shape_span = store.domain().shape();

    std::vector<int64_t> shape(shape_span.begin(), shape_span.end());

    tensorstore::DataType dtype = store.dtype();
    std::cout << "dtype: " << dtype << std::endl;
    if (dtype == tensorstore::dtype_v<uint16_t>)
    {
      auto x = tensorstore::Read<tensorstore::zero_origin>(store).result().value();

      auto * p = reinterpret_cast<uint16_t *>(x.data());
      std::cout << "p: " << *p << " " << p[1] << " " << p[2] << " " << p[3] << " " << p[4] << std::endl;
    }
    else
    {
      std::cerr << "Unsupported dtype";
      return EXIT_FAILURE;
    }
  }
  else
  {
    std::cout << "status BAD\n" << result.status();
    return EXIT_FAILURE;
  }


  // JSON uses a separate driver
  auto attrs_store =
    tensorstore::Open<::nlohmann::json, 0>(
      { { "driver", "json" },
        { "kvstore", { { "driver", "file" }, { "path", "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/.zattrs" } } } })
      .result()
      .value();

  // Sets attrs_array to a rank-0 array of ::nlohmann::json
  auto attrs_array_result = tensorstore::Read(attrs_store).result();

  ::nlohmann::json attrs;
  if (attrs_array_result.ok())
  {
    attrs = attrs_array_result.value()();
    std::cout << "attrs: " << attrs << std::endl;
  }
  else if (absl::IsNotFound(attrs_array_result.status()))
  {
    attrs = ::nlohmann::json::object_t();
  }
  else
  {
    std::cout << "Error: " << attrs_array_result.status();
  }

  return EXIT_SUCCESS;
}

If I attempt to customize the read region:

      auto dimSpec = tensorstore::Dims(0).HalfOpenInterval(0, shape[0]);
      for (unsigned d = 1; d < shape.size(); ++d)
      {
        dimSpec = dimSpec | tensorstore::Dims(d).HalfOpenInterval(0, shape[d] / 2);
      }
      auto x = tensorstore::Read<tensorstore::zero_origin>(store | dimSpec).result().value();

I get a compile error:

13>C:\Misc\Tester\tester.cpp(38,83): error C2678: binary '|': no operator found which takes a left-hand operand of type 'tensorstore::DimExpression<tensorstore::internal_index_space::IntervalSliceOp<tensorstore::Index,tensorstore::Index,tensorstore::Index>,tensorstore::internal_index_space::DimensionList<std::array<tensorstore::DimensionIndex,1>>>' (or there is no acceptable conversion)
13>C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\include\cstddef(42,27): message : could be 'std::byte std::operator |(const std::byte,const std::byte) noexcept' [found using argument-dependent lookup]
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/data_type.h(376,42): message : or       'tensorstore::DataTypeConversionFlags tensorstore::operator |(tensorstore::DataTypeConversionFlags,tensorstore::DataTypeConversionFlags)' [found using argument-dependent lookup]
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/open_mode.h(72,27): message : or       'tensorstore::OpenMode tensorstore::operator |(tensorstore::OpenMode,tensorstore::OpenMode)' [found using argument-dependent lookup]
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/open_mode.h(117,32): message : or       'tensorstore::ReadWriteMode tensorstore::operator |(tensorstore::ReadWriteMode,tensorstore::ReadWriteMode)' [found using argument-dependent lookup]
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/index_space/alignment.h(55,41): message : or       'tensorstore::DomainAlignmentOptions tensorstore::operator |(tensorstore::DomainAlignmentOptions,tensorstore::DomainAlignmentOptions)' [found using argument-dependent lookup]
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/resize_options.h(55,36): message : or       'tensorstore::ResolveBoundsMode tensorstore::operator |(tensorstore::ResolveBoundsMode,tensorstore::ResolveBoundsMode)' [found using argument-dependent lookup]
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/resize_options.h(128,29): message : or       'tensorstore::ResizeMode tensorstore::operator |(tensorstore::ResizeMode,tensorstore::ResizeMode)' [found using argument-dependent lookup]
13>C:\Misc\Tester\tester.cpp(38,83): message : while trying to match the argument list '(tensorstore::DimExpression<tensorstore::internal_index_space::IntervalSliceOp<tensorstore::Index,tensorstore::Index,tensorstore::Index>,tensorstore::internal_index_space::DimensionList<std::array<tensorstore::DimensionIndex,1>>>, tensorstore::DimExpression<tensorstore::internal_index_space::IntervalSliceOp<tensorstore::Index,tensorstore::Index,tensorstore::Index>,tensorstore::internal_index_space::DimensionList<std::array<tensorstore::DimensionIndex,1>>>)'

Why can't I do this index composition outside of a function call? And what is the proper syntax to accomplish this?

dzenanz avatar Oct 27 '22 16:10 dzenanz

A TensorStore object logically contains:

  • Driver pointer
  • ReadWriteMode
  • Transaction (unused here)
  • IndexTransform (https://google.github.io/tensorstore/index_space.html#index-transform)

tensorstore::Dims(0).HalfOpenInterval(0, shape[0]) is a DimExpression. It can be called as a function on an IndexTransform, and also on other supported objects like TensorStore objects that contain an IndexTransform, in which case it just applies to the contained IndexTransform.

The operator| "pipeline operator" syntax is just a convenience syntax for calling the right hand side as a function:

store = (std::move(store) | tensorstore::Dims(0).HalfOpenInterval(0, shape[0])).value();

is equivalent to

store = tensorstore::Dims(0).HalfOpenInterval(0, shape[0])(std::move(store)).value();

However, as you noted, the operator| "pipeline operator" is not associative, and Tensorstore doesn't include a type that can directly hold a sequence of (unapplied) DimExpression objects. In principle you could store them in e.g. std::vector<std::function<Result<TensorStore<>> (TensorStore<>)>>, but additionally, you shouldn't normally have DimExpression objects outlive the full expression in which they are constructed, because they can easily have dangling references to temporaries.

Instead, if you want to compose multiple indexing operations without applying them directly to a TensorStore object, you can use an IndexTransform object:

tensorstore::IndexTransform<> transform = tensorstore::IdentityTransform(store.domain());
for (unsigned d = 0; d < shape.size(); ++d)
{
    transform = (std::move(transform) | tensorstore::Dims(d).HalfOpenInterval(0, shape[d] / 2)).value();
}

auto x = tensorstore::Read<tensorstore::zero_origin>(store | transform).value();

jbms avatar Oct 27 '22 19:10 jbms

I got my example working. Thank you.

#include "tensorstore/context.h"
#include "tensorstore/open.h"
#include "tensorstore/index_space/dim_expression.h"

#include "itkImageFileReader.h"

void
jsonRead()
{
  // JSON uses a separate driver
  auto attrs_store =
    tensorstore::Open<::nlohmann::json, 0>(
      { { "driver", "json" },
        { "kvstore", { { "driver", "file" }, { "path", "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/.zattrs" } } } })
      .result()
      .value();

  // Sets attrs_array to a rank-0 array of ::nlohmann::json
  auto attrs_array_result = tensorstore::Read(attrs_store).result();

  ::nlohmann::json attrs;
  if (attrs_array_result.ok())
  {
    attrs = attrs_array_result.value()();
    std::cout << "attrs: " << attrs << std::endl;
  }
  else if (absl::IsNotFound(attrs_array_result.status()))
  {
    attrs = ::nlohmann::json::object_t();
  }
  else
  {
    std::cout << "Error: " << attrs_array_result.status();
  }
}

int
exampleRead()
{
  tensorstore::Context context = tensorstore::Context::Default();

  std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0";
  // std::string path = "C:/Dev/ITKIOOMEZarrNGFF/test/zarr_implementations/examples/zarr.zr";

  auto openFuture =
    tensorstore::Open({ { "driver", "zarr" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                      context,
                      tensorstore::OpenMode::open,
                      tensorstore::RecheckCached{ false },
                      tensorstore::ReadWriteMode::read);

  auto result = openFuture.result();
  if (result.ok())
  {
    auto store = result.value();
    auto domain = store.domain();
    std::cout << "domain.shape(): " << domain.shape() << std::endl;
    std::cout << "domain.origin(): " << domain.origin() << std::endl;
    auto shape_span = store.domain().shape();

    std::vector<int64_t> shape(shape_span.begin(), shape_span.end());

    tensorstore::DataType dtype = store.dtype();
    std::cout << "dtype: " << dtype << std::endl;
    if (dtype == tensorstore::dtype_v<uint16_t>)
    {
      tensorstore::IndexTransform<> transform = tensorstore::IdentityTransform(store.domain());
      for (unsigned d = 0; d < shape.size(); ++d)
      {
        transform = (std::move(transform) | tensorstore::Dims(d).HalfOpenInterval(0, shape[d] / 2)).value();
      }
      auto x = tensorstore::Read<tensorstore::zero_origin>(store | transform).value();

      auto * p = reinterpret_cast<uint16_t *>(x.data());
      std::cout << "p: " << *p << " " << p[1] << " " << p[2] << " " << p[3] << " " << p[4] << std::endl;
    }
    else
    {
      std::cerr << "Unsupported dtype";
      return EXIT_FAILURE;
    }
  }
  else
  {
    std::cout << "status BAD\n" << result.status();
    return EXIT_FAILURE;
  }

  return EXIT_SUCCESS;
}

void
exampleWrite()
{
  using ImageType = itk::Image<short, 3>;
  auto                 image = itk::ReadImage<ImageType>("C:/a/DzZ_T1.mha");
  auto                 size = image->GetLargestPossibleRegion().GetSize(); // ijk = 512x512x12
  std::vector<int64_t> shape(size.rbegin(), size.rend());                  // kji

  auto context = tensorstore::Context::Default();
  auto dataset = tensorstore::Open(
                   {
                     { "driver", "zarr" },
                     { "kvstore", { { "driver", "file" }, { "path", "C:/a/DzZ_T1.zarr" } } },
                     { "metadata",
                       {
                         { "compressor", { { "id", "blosc" } } },
                         { "dtype", "<i2" },
                         { "shape", shape },
                         // { "dimension_separator", "." },
                       } },
                   },
                   context,
                   tensorstore::OpenMode::create | tensorstore::OpenMode::delete_existing,
                   tensorstore::ReadWriteMode::read_write)
                   .result();

  auto arr = tensorstore::Array(image->GetBufferPointer(), shape, tensorstore::c_order);

  auto writeFuture = tensorstore::Write(tensorstore::UnownedToShared(arr), dataset);
  auto result = writeFuture.result();
  if (result.ok())
  {
    std::cout << "Written successfully\n";
  }
  else
  {
    std::cout << "Write error:\n" << result.status();
    return;
  }
}

int
main(int argc, char ** argv)
{
  jsonRead();
  exampleWrite();
  return exampleRead();
}

I now realize that zip is not on the list of KvStore drivers. What would be the best way to achieve functionality similar to zarr.storage.ZipStore? What would be the easiest way?

dzenanz avatar Oct 27 '22 20:10 dzenanz

Do you need read/write support for zip or just read-only?

Best way would be to implement a "zip" kvstore driver in TensorStore (which functions as an adapter on top of a "base" kvstore) --- that has been on our todo list.

Easiest way --- would be to do the zipping and unzipping externally, using a temp directory.

jbms avatar Oct 27 '22 20:10 jbms

Ideally, both read and write. The main need for zip store is for use in JavaScript and WebAssembly, where a single file can be passed around as a blob, and a directory is very inconvenient (to say the least).

dzenanz avatar Oct 27 '22 21:10 dzenanz

Clearly zip may indeed be useful in some cases. But one of the main use cases for the zarr format is to enable efficient "partial I/O" of large arrays, meaning just reading the desired portion as needed rather than always reading the entire thing. Passing around the entire array as a zip file defeats that purpose. Instead you could retrieve individual chunks on demand from a server, e.g. as done in https://github.com/google/neuroglancer

jbms avatar Oct 28 '22 22:10 jbms

I completely agree. But when developing a web application, starting off with a zip file is a lot easier than having to immediately deal with cloud storage and its complexity (authentication, tolerating network faults etc.).

dzenanz avatar Oct 28 '22 23:10 dzenanz

Read and write zip support is important for working with these datasets in a way that reduces the number of inodes on local filesystems but also objects in JavaScript / WebAssembly.

Passing around the entire array as a zip file defeats that purpose.

We can still retrieve parts of the data on demand through HTTP range requests.

thewtex avatar Oct 31 '22 12:10 thewtex

We have found that a good Zip library is minizip-ng.

thewtex avatar Oct 31 '22 13:10 thewtex

With CMake options:

option(MZ_COMPAT OFF "Enables compatibility layer")
option(MZ_ZLIB OFF "Enables ZLIB compression")
option(MZ_BZIP2 OFF "Enables BZIP2 compression")
option(MZ_LZMA OFF "Enables LZMA & XZ compression")
option(MZ_ZSTD OFF "Enables ZSTD compression")
option(MZ_PKCRYPT OFF "Enables PKWARE traditional encryption")
option(MZ_WZAES OFF "Enables WinZIP AES encryption")
option(MZ_OPENSSL OFF "Enables OpenSSL encryption")
option(MZ_LIBBSD OFF "Build with libbsd for crypto random")
option(MZ_SIGNING OFF "Enables zip signing support")
option(MZ_ICONV OFF "Enables iconv string encoding conversion library")

thewtex avatar Oct 31 '22 13:10 thewtex

This function works for writing JSON:

// JSON file path, e.g. "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/.zgroup"
void
writeJson(nlohmann::json json, std::string path)
{
  auto attrs_store = tensorstore::Open<nlohmann::json, 0>(
                       { { "driver", "json" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                       tsContext,
                       tensorstore::OpenMode::create | tensorstore::OpenMode::delete_existing,
                       tensorstore::ReadWriteMode::read_write)
                       .result()
                       .value();
  auto writeFuture = tensorstore::Write(tensorstore::MakeScalarArray(json), attrs_store);

  auto result = writeFuture.result();
  if (!result.ok())
  {
    itkGenericExceptionMacro(<< "There was an error writing metadata to file '" << path
                             << ". Error details: " << result.status());
  }
}

Can I make it pretty-print the JSON instead of writing the minified version?

dzenanz avatar Nov 11 '22 23:11 dzenanz

Thanks to guidance from this issue I have begun to integrate OME-Zarr in my volume viewer as seen here: https://github.com/allen-cell-animated/agave/pull/73 https://github.com/allen-cell-animated/agave/pull/73/files#diff-c2505cb0ef29a0b26d0eedacbc8049b5b3dd87214baa7ff4ae72a5bb23d7168f

toloudis avatar Jan 30 '23 17:01 toloudis

I see you gave netCDF a shot as well, @toloudis 😄

dzenanz avatar Jan 30 '23 20:01 dzenanz

I see you gave netCDF a shot as well, @toloudis 😄

Nothing against it, but I did attempt it but just found the integration with cmake and getting it to build was hard. tensorstore just sort of worked right away with not too many lines of cmake.

toloudis avatar Jan 30 '23 21:01 toloudis

// JSON file path, e.g. "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/.zgroup"
void
writeJson(nlohmann::json json, std::string path)
{
  auto attrs_store = tensorstore::Open<nlohmann::json, 0>(
                       { { "driver", "json" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                       tsContext,
                       tensorstore::OpenMode::create | tensorstore::OpenMode::delete_existing,
                       tensorstore::ReadWriteMode::read_write)
                       .result()
                       .value();
  auto writeFuture = tensorstore::Write(tensorstore::MakeScalarArray(json), attrs_store);

  auto result = writeFuture.result();
  if (!result.ok())
  {
    itkGenericExceptionMacro(<< "There was an error writing metadata to file '" << path
                             << ". Error details: " << result.status());
  }
}

I tried to replicate the writeJson example, but it gives me the following error:

/home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/array.h:945:55:   required from 'tensorstore::Array<ElementTagType, Rank, OriginKind, LayoutContainerKind>::Storage::Storage(PointerInit&&, LayoutInit&&) [with PointerInit = std::shared_ptr<nlohmann::basic_json<> >; LayoutInit = tensorstore::StridedLayout<0>; ElementTagType = tensorstore::Shared<nlohmann::basic_json<> >; long int Rank = 0; tensorstore::ArrayOriginKind OriginKind = tensorstore::ArrayOriginKind::zero; tensorstore::ContainerKind LayoutContainerKind = tensorstore::ContainerKind::container]'
/home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/array.h:526:9:   required from 'tensorstore::Array<ElementTagType, Rank, OriginKind, LayoutContainerKind>::Array(SourcePointer, SourceLayout&&) [with SourcePointer = std::shared_ptr<nlohmann::basic_json<> >; SourceLayout = tensorstore::StridedLayout<0>; std::enable_if_t<IsPairImplicitlyConvertible<SourcePointer, SourceLayout, tensorstore::ElementPointer<SourceTag>, tensorstore::StridedLayout<R, O, C> > >* <anonymous> = 0; ElementTagType = tensorstore::Shared<nlohmann::basic_json<> >; long int Rank = 0; tensorstore::ArrayOriginKind OriginKind = tensorstore::ArrayOriginKind::zero; tensorstore::ContainerKind LayoutContainerKind = tensorstore::ContainerKind::container]'
/home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/array.h:1147:52:   required from 'tensorstore::SharedArray<Element, 0> tensorstore::MakeScalarArray(const Element&) [with Element = nlohmann::basic_json<>; SharedArray<Element, 0> = Array<Shared<nlohmann::basic_json<> >, 0, tensorstore::ArrayOriginKind::zero, tensorstore::ContainerKind::container>]'
/home/TensorStore/code/write_file.cpp:33:69:   required from here
/home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/util/element_pointer.h:331:56: error: call of overloaded 'static_pointer_cast<tensorstore::ElementPointer<tensorstore::Shared<nlohmann::basic_json<> > >::Element>(std::enable_if_t<true, std::shared_ptr<nlohmann::basic_json<> > >)' is ambiguous
  331 |                  internal::static_pointer_cast<Element>(
      |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
  332 |                      internal_element_pointer::ConvertPointer<Pointer>(
      |                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  333 |                          std::forward<SourcePointer>(pointer)))) {}
      |                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/internal/intrusive_ptr.h:122,
                 from /home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/context_impl_base.h:35,
                 from /home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/context.h:30,
                 from /home/TensorStore/code/write_file.cpp:1:
/home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/internal/memory.h:65:27: note: candidate: 'std::shared_ptr<_Tp> tensorstore::internal::static_pointer_cast(std::shared_ptr<_Tp>&&) [with T = nlohmann::basic_json<>]'
   65 | inline std::shared_ptr<T> static_pointer_cast(std::shared_ptr<T>&& other) {
      |                           ^~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/12/memory:77,
                 from /home/TensorStore/code/build/_deps/absl-src/absl/status/internal/status_internal.h:17,
                 from /home/TensorStore/code/build/_deps/absl-src/absl/status/status.h:59,
                 from /home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/context.h:28:
/usr/include/c++/12/bits/shared_ptr.h:745:5: note: candidate: 'std::shared_ptr<_Tp> std::static_pointer_cast(shared_ptr<_Tp>&&) [with _Tp = nlohmann::basic_json<>; _Up = nlohmann::basic_json<>]'
  745 |     static_pointer_cast(shared_ptr<_Up>&& __r) noexcept
      |     ^~~~~~~~~~~~~~~~~~~
/usr/include/c++/12/bits/shared_ptr.h:700:5: note: candidate: 'std::shared_ptr<_Tp> std::static_pointer_cast(const shared_ptr<_Tp>&) [with _Tp = nlohmann::basic_json<>; _Up = nlohmann::basic_json<>]'
  700 |     static_pointer_cast(const shared_ptr<_Up>& __r) noexcept
      |     ^~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/12/bits/shared_ptr.h:53:
/usr/include/c++/12/bits/shared_ptr_base.h:1929:5: note: candidate: 'std::__shared_ptr<_Tp1, _Lp> std::static_pointer_cast(const __shared_ptr<_Tp2, _Lp>&) [with _Tp = nlohmann::basic_json<>; _Tp1 = nlohmann::basic_json<>; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]'
 1929 |     static_pointer_cast(const __shared_ptr<_Tp1, _Lp>& __r) noexcept
      |     ^~~~~~~~~~~~~~~~~~~

I am using the latest version 0.1.31 of TensorStore, running on Linux 22.04.

crl123 avatar Feb 13 '23 03:02 crl123