Propose design doc and roadmap for VirtualiZarr 2.0
There's a lot of interconnected technical design issues open. This design doc should help us discuss solutions that impact multiple issues.
The key differences to the current design is using protocols more than ABCs, defining a "Reader" as a protocol required by VirtualiZarr that can be used by backends and the ManifestStore, and orienting backends around ManifestStore creation rather than Dataset creation.
Just a starting point building off ideas shared by @kylebarron and @sharkinsspatial.
@kylebarron @chuckwondo @d-v-b I wasn't able to request reviews from you all but it'd be great to get your thoughts on this if you have time
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 89.34%. Comparing base (ff1ddb4) to head (81705e9).
Additional details and impacted files
@@ Coverage Diff @@
## develop #568 +/- ##
===========================================
+ Coverage 88.80% 89.34% +0.53%
===========================================
Files 34 34
Lines 1948 1943 -5
===========================================
+ Hits 1730 1736 +6
+ Misses 218 207 -11
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Thanks for putting this together @maxrjones it is super helpful to have these conversations synthesized into this design doc, especially with the associated issues linked. Take my comments with a grain of salt as I haven't been involved directly in these discussions.
I've resolved all the outstanding comments. The only substantial difference from what others have proposed (e.g., https://github.com/zarr-developers/VirtualiZarr/issues/553#issuecomment-2852243601) is that the top-level open_virtual_dataset and the backends accept both a ReadableFile and a ObjectReader. I don't know a way around this since ManifestStore should use get_ranges_async in order to be performant but many libraries rely on a BufferedFile-like interface.
A minor difference is the definition of a more strict distinction between a Reader (something that gets bytes from files) and a Backend (something that interprets a file to construct a ManifestStore.
Not sure which Phase in the roadmap it fits into, but having https support for virtual stores would be really clutch for a lot of usecases.
@TomNicholas do you rather merge this into the repo as documentation or close it as not needed?
It's very of-its-time, and all architectural information should also be present in the public docs (especially the page on data structures), so let's just close.
Having a public roadmap would be nice (see also https://github.com/zarr-developers/VirtualiZarr/issues/451) but the roadmap file here is more like a nitty-gritty development checklist for a specific moment.