Jeremy Maitin-Shepard comments

Results 479 comments of


                                            Jeremy Maitin-Shepard

Breaking Change: Dropping support for Bazel+MSVC

Please reconsider dropping support. While I am well aware that supporting Bazel+MSVC is frustrating, Protobuf is a transitive dependency of a massive number of projects, and especially a large fraction...

Incrementally-populated Zarr Arrays

I certainly agree that there are a lot of use cases for efficiently querying which chunks have data, and possibly storing additional per-chunk statistics. For arrays with a very large...

Incrementally-populated Zarr Arrays

Perhaps you can clarify what you are trying to accomplish exactly, independent of any particular solution? In https://github.com/zarr-developers/zarr-specs/issues/300#issuecomment-2204735276 you mention lazily-writing an array as individual chunks are requested. This seems...

Initializing a group or array is not thread-safe, even with mode='w'

I think "w+" may not be the best name for this option, since "w+" as an fopen option means to truncate the existing data. In tensorstore we have an option...

Restructuring multiprocessing

I recently ran into this issue myself: the basic issue that I observed is as follows: When the documents is larger than 10 times the number of workers, there will...

[v3] Structured dtype support

Tensorstore supports structured data types for zarr v2 but not v3. I can imagine that structured data types are convenient for some use cases but they also introduce a number...

[v3] Structured dtype support

> Good questions! Here are some more comments based on your thoughts: > > * Regarding AoS or SoA, numpy supports both, so we could still support both in Zarr....

[v3] Structured dtype support

> Hm looking at it again, seems like byte codec could also define the type configuration? In that case what could be the dtype metadata and when would the casting...

[v3] Structured dtype support

> > Right, not supported in numpy but having a collection of related fields, some of which are scalars and some of which are strings, is fairly common. Any array...

question about writing parallel and group handling

Tensorstore automatically handles chunks in parallel, so you can just issue the write from a single thread and all relevant chunks will be handled in parallel. Groups currently aren't supported...