rioxarray
rioxarray copied to clipboard
Rename bands as variables using long_name attribute
The typical Geotiff images that I use have a description for each band that rioxarray sets as the long_name attribute.
By default, images are read as xarray.DataArray with a band, y, and x dimensions, and the long_name attribute is a tuple containing the description for each band.
Using band_as_variable=True (available since version 0.13) gives the user the option to instead get a xarray.Dataset with dimensions y and x, and N data variables (one for each band). The names of the variables are simply band_1, band_2, ..., band_N, which makes sense. You can also find the long_name attribute within each data variable.
I think it would be useful to have another optional keyword argument that renames the data variables to the long_name (description). Currently, this can be done as shown below:
import rioxarray
image = rioxarray.open_rasterio("myMultiBandImage.tif", band_as_variable=True)
image = image.rename({band:image[band].attrs["long_name"] for band in image})
Having an option to do this directly with rioxarray.open_rasterio would be useful:
image = rioxarray.open_rasterio("myMultiBandImage.tif", band_as_variable=True, rename_bands=True)
Of course, for backwards compatibility it should be set to False by default and it only makes sense when used with band_as_variable=True.
https://github.com/corteva/rioxarray/pull/600#discussion_r1011706293
The variable name should only contain alphanumeric characters and underscores. The band description could potentially be a sentence with any characters. This ensures consistency and stability.
I don't think this is too terrible for users to do if they have a safe long_name:
image = image.rename({band:image[band].attrs["long_name"] for band in image})
Maybe we could just put this in the docs as a tip/example?
@snowman2 - I could probably mine stuff I have done for generic examples if a general writing rasters [or reading] notebook is useful.
e.g. this is what it looks like as an ERS grid, this with LZW compression, or whatever else.
e.g. maybe extending something like this: https://corteva.github.io/rioxarray/html/examples/convert_to_raster.html
for things that I would have liked to see when I first came across that sort of info many moons ago
Those documentation contributions would be great 👍
Ok, will see what I can do shortly!
Some things like this? https://github.com/corteva/rioxarray/pull/753
I don't think this is too terrible for users to do if they have a safe
long_name:image = image.rename({band:image[band].attrs["long_name"] for band in image})
While that is totally true, I think it is still a bit confusing that one doesn't end up with the same dataset after a roundtrip (xarray.Dataset->geotif->xarray.Dataset). When writing a dataset to a geotiff with rioxarray, the data variable names are written out as the band descriptions. So when reading the geotiff back in, it would be consequent to use the descriptions as data variable names.
In that sense, we also could give the behavior of open_rasterio() with band_as_variable=False a second thought. In this case, you get back a DataArray with a long_name attribute that contains a tuple with the band descriptions. The tuple, of course, has the same length as the band dimension and could/should be considered the 3rd dimension's coordinates.
I don't want to derail this too much, but am curious about your thoughts here: Since geotiffs don't really have the notion of multiple datasets (but just 'bands' / 'channel' / a 3rd dimension), I am wondering if writing a dataset to a geotiff even makes conceptual sense. Possibly more consequent would be to allow writing only 2D or 3D DataArrays. In the latter case, the 3rd dimension's coordinates could/should be used as the band descriptions. Right now, when writing a 3D DataArray to a geotiff, the 3rd dimension's coordinates are discarded.