VirtualiZarr icon indicating copy to clipboard operation
VirtualiZarr copied to clipboard

[C]Worthy OAE dataset example

Open TomNicholas opened this issue 8 months ago • 0 comments

  • [x] Closes #132
  • [ ] Changes are documented in docs/releases.rst

A list of everything about this demo that is janky and should be fixed before merging, in order of most to least janky:

  • [ ] Functions fail intermittently for unknown reasons, either with a HTTP error (which makes some sense) or saying they ran out of memory, which makes no sense
  • [ ] Adding per-task retries might fix this (https://github.com/zarr-developers/VirtualiZarr/pull/575), however Lithops retries don't work properly, see https://github.com/lithops-cloud/lithops/issues/1429
  • [x] https://github.com/zarr-developers/VirtualiZarr/issues/574
  • [ ] Needs option to cache full file to be merged #564, see #625
  • [ ] Even then still relies on the unreleased develop branch of VirtualiZarr
  • [ ] Can't glob for filepaths in bucket, so we need https://github.com/zarr-developers/VirtualiZarr/issues/569
  • [ ] combine_by_coords didn't work because it triggered a reindex, and I don't know why
  • [x] https://github.com/zarr-developers/numcodecs/issues/744
  • [ ] ManifestStore can't load scalars #530
  • [ ] Have to rename paths to non-http URLs, because @maxrjones 's cache PR generates http URLs, but Icechunk can't store them yet (https://github.com/earth-mover/icechunk/issues/526) - EDIT: for cases that don't require auth this should now work, but is untested
  • [ ] I had to add the --provenance kwarg to my local docker build and I don't actually know if I need that
  • [ ] I have to manually paste my lithops credentials into the .lithops_config because they seem not to be discovered when set as environment variables in the notebook
  • [x] The open_virtual_mfdataset function is not documented, but was added in https://github.com/zarr-developers/VirtualiZarr/pull/349 (docs added in #590)
  • [x] ~~open_virtual_mfdataset parallel kwarg is different to the parallel kwarg for xr.open_mfdataset, because the generalization here should be merged upstream https://github.com/pydata/xarray/pull/9932~~ (EDIT: This one isn't important)

I have workarounds for basically all of them, but they should all be understood and fixed.

TomNicholas avatar Apr 22 '25 19:04 TomNicholas