Awesome-Zarr icon indicating copy to clipboard operation
Awesome-Zarr copied to clipboard

🎀 Awesome Zarr resources

Zarr

Awesome

drawing

Zarr is a cloud-native, chunked, compressed, and hierarchical array data format.

Contents

Resources

  • Existing resources
  • Introductory videos
  • Zarr V3
  • Libraries
  • Platforms
  • Articles
  • Talks & Videos
  • Life sciences

Topics

  • Zarr & other array data formats
  • GeoZarr
  • Zarr & STAC

Resources

Existing resources

The Zarr website is already an excellent resource for learning about Zarr and its ecosystem. This list is intended to complement the website with a curated and opinionated list of resources.

This list focuses on Geo/Earth Sciences, but is not limited to that domain.

Existing lists

Lists

Introductory videos

Introductory talks Youtube playlist

Two excellent and up-to-date introductory talks:

Zarr V3

Zarr V3 is the upcoming version of Zarr. It is a major update that will bring many new features and improvements.

If you're getting into Zarr now, it might be a good idea to start with Zarr V3.

For an excellent in-depth overview, see the ESIP series of talks

Libraries

This list contains libraries that directly relate to Zarr in some way.

For implementations of Zarr, see Zarr Implementations.

Storage & I/O

ETL

Developer-oriented

  • numcodecs: Compression and transformation codecs used by Zarr
  • pydantic-zarr: Pydantic models for Zarr objects
  • traverzarr: Traversing Zarr JSON as if it's a filesystem
  • zarr_checksum: Calculating checksum information form Zarr
  • zarrdump: Describe zarr stores from the command line

Visualization: For tools & libraries for visualization, see visualization section

Kerchunk

Kerchunk allows you to efficiently read chunked data formats such as GRID, NetCDF, COGs by exposing them as a Zarr store.

Talks and tutorials

Future of Kerchunk

In the future, Kerchunk will be split into upstream functionality in Zarr itself and a new VirtualiZarr package.

Platforms

  • Arraylake: a data lake platform based on Zarr. The company, Earthmover was started by core Zarr developers.

Articles

Talks & Videos

Existing lists

Talks

Life sciences

Zarr has seen great adoption in the life sciences domain.

  • bdz: Zarr-based format for storing quantitative biosystems dynamics data
  • ome-zarr-py: Implementation of next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
  • ez_zarr: Easy, high-level access to OME-Zarr filesets
  • hdmf-zarr: Zarr I/O backend for HDMF

Talks and resources

Visualization

Zarr has seen most work on visualization in the bioimaging community:

Topics

Zarr & other array data formats

For a general overview, see

Essentially all other common array data formats can be exposed as Zarr. See Kerchunk.

NetCDF & HDF5

Zarr, NetCDF, and HDF5 are three separate data formats that nonetheless relate to each other in multiple ways.

Resources

COG: Cloud-Optimized GeoTIFF

N5

Zarr and N5 are two similar array data formats that share common goals and development.

The Zarr V3 spec aims to provide a common implementation target (sources: 1, 2)

Links

GeoZarr

GeoZarr is a proposal for a Zarr-based geospatial data format, being submitted as an OGC standard

GeoZarr will define a metadata convention for Zarr stores that contain geospatial data.

It will also define the relationship of Zarr with CF and NetCDF

Links

Zarr & STAC

STAC provides a common structure for describing and cataloging spatiotemporal assets.

With its hierarchical structure and key-value metadata support, Zarr's capabilities overlap significantly with STAC.

The communities have not yet converged on a canonical representation of Zarr datasets through STAC.

Today, a good example of exposing Zarr in STAC is Planetary Computer

More discussion & Related links

In the future, the Zarr V3 Spec and GeoZarr convention will likely enable greater interoperability between STAC and Zarr.