rioxarray icon indicating copy to clipboard operation
rioxarray copied to clipboard

Option to write many rasters for chunked DataArray?

Open TomAugspurger opened this issue 3 years ago • 3 comments

Currently, .rio.to_raster will generate a single raster, even for chunked DataArrays. In the case of very large Dask Arrays, it might be more useful to instead write many rasters, perhaps one per chunk. This would better-align with, e.g. dask.DataFrame.to_csv, which writes a single CSV file per partition.

This adds some complexity to how the actual filename is determined, but we can rely on some conventions established in dask / elsewhere to come up with something sensible.

https://discourse.pangeo.io/t/generating-cogs-and-stac-items-from-dataarrays/1913 has some more background information.

This is somewhat related to https://github.com/corteva/rioxarray/issues/432, by providing an alternative that wouldn't need locks.

TomAugspurger avatar Nov 12 '21 22:11 TomAugspurger

What about this: https://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_zarr.html

GDAL 3.4 added support for Zarr.

Or, are you specifically needing GeoTIff?

snowman2 avatar Nov 12 '21 22:11 snowman2

In this cases, specifically COGs for interoperability with that toolchain.

TomAugspurger avatar Nov 12 '21 22:11 TomAugspurger

The xcog implementation looks pretty neat :+1:.

My initial thoughts:

  • Would be fun to call the multi-file COG output format czar or czarr :smile:

  • Having it as its own repo like Zarr: https://github.com/zarr-developers/zarr-python, might be helpful for potential adoption of the format in other projects such as GDAL.

  • Maybe stackstac would be interested in writing a dask xarray to a STAC dataset on disk?

  • If this gets added: https://github.com/pydata/xarray/issues/5954

    Then xcog could be a backend:

    xds.save_dataset(directory, backend="xcog")
    

snowman2 avatar Nov 13 '21 00:11 snowman2