VirtualiZarr icon indicating copy to clipboard operation
VirtualiZarr copied to clipboard

Propose design doc and roadmap for VirtualiZarr 2.0

Open maxrjones opened this issue 8 months ago • 5 comments

There's a lot of interconnected technical design issues open. This design doc should help us discuss solutions that impact multiple issues.

The key differences to the current design is using protocols more than ABCs, defining a "Reader" as a protocol required by VirtualiZarr that can be used by backends and the ManifestStore, and orienting backends around ManifestStore creation rather than Dataset creation.

Just a starting point building off ideas shared by @kylebarron and @sharkinsspatial.

maxrjones avatar Apr 25 '25 15:04 maxrjones

@kylebarron @chuckwondo @d-v-b I wasn't able to request reviews from you all but it'd be great to get your thoughts on this if you have time

maxrjones avatar Apr 25 '25 15:04 maxrjones

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 89.34%. Comparing base (ff1ddb4) to head (81705e9).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #568      +/-   ##
===========================================
+ Coverage    88.80%   89.34%   +0.53%     
===========================================
  Files           34       34              
  Lines         1948     1943       -5     
===========================================
+ Hits          1730     1736       +6     
+ Misses         218      207      -11     

see 4 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Apr 26 '25 16:04 codecov[bot]

Thanks for putting this together @maxrjones it is super helpful to have these conversations synthesized into this design doc, especially with the associated issues linked. Take my comments with a grain of salt as I haven't been involved directly in these discussions.

abarciauskas-bgse avatar May 01 '25 14:05 abarciauskas-bgse

I've resolved all the outstanding comments. The only substantial difference from what others have proposed (e.g., https://github.com/zarr-developers/VirtualiZarr/issues/553#issuecomment-2852243601) is that the top-level open_virtual_dataset and the backends accept both a ReadableFile and a ObjectReader. I don't know a way around this since ManifestStore should use get_ranges_async in order to be performant but many libraries rely on a BufferedFile-like interface.

A minor difference is the definition of a more strict distinction between a Reader (something that gets bytes from files) and a Backend (something that interprets a file to construct a ManifestStore.

maxrjones avatar May 06 '25 08:05 maxrjones

Not sure which Phase in the roadmap it fits into, but having https support for virtual stores would be really clutch for a lot of usecases.

norlandrhagen avatar May 15 '25 18:05 norlandrhagen

@TomNicholas do you rather merge this into the repo as documentation or close it as not needed?

maxrjones avatar Jun 16 '25 21:06 maxrjones

It's very of-its-time, and all architectural information should also be present in the public docs (especially the page on data structures), so let's just close.

TomNicholas avatar Jun 17 '25 08:06 TomNicholas

Having a public roadmap would be nice (see also https://github.com/zarr-developers/VirtualiZarr/issues/451) but the roadmap file here is more like a nitty-gritty development checklist for a specific moment.

TomNicholas avatar Jun 17 '25 08:06 TomNicholas