rioxarray
rioxarray copied to clipboard
Option to write many rasters for chunked DataArray?
Currently, .rio.to_raster
will generate a single raster, even for chunked DataArrays. In the case of very large Dask Arrays, it might be more useful to instead write many rasters, perhaps one per chunk. This would better-align with, e.g. dask.DataFrame.to_csv
, which writes a single CSV file per partition.
This adds some complexity to how the actual filename is determined, but we can rely on some conventions established in dask / elsewhere to come up with something sensible.
https://discourse.pangeo.io/t/generating-cogs-and-stac-items-from-dataarrays/1913 has some more background information.
This is somewhat related to https://github.com/corteva/rioxarray/issues/432, by providing an alternative that wouldn't need locks.
What about this: https://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_zarr.html
GDAL 3.4 added support for Zarr.
Or, are you specifically needing GeoTIff?
In this cases, specifically COGs for interoperability with that toolchain.
The xcog implementation looks pretty neat :+1:.
My initial thoughts:
-
Would be fun to call the multi-file COG output format
czar
orczarr
:smile: -
Having it as its own repo like Zarr: https://github.com/zarr-developers/zarr-python, might be helpful for potential adoption of the format in other projects such as GDAL.
-
Maybe
stackstac
would be interested in writing a dask xarray to a STAC dataset on disk? -
If this gets added: https://github.com/pydata/xarray/issues/5954
Then xcog could be a backend:
xds.save_dataset(directory, backend="xcog")