rioxarray icon indicating copy to clipboard operation
rioxarray copied to clipboard

Rename bands as variables using long_name attribute

Open lopezvoliver opened this issue 1 year ago • 9 comments
trafficstars

The typical Geotiff images that I use have a description for each band that rioxarray sets as the long_name attribute.

By default, images are read as xarray.DataArray with a band, y, and x dimensions, and the long_name attribute is a tuple containing the description for each band.

Using band_as_variable=True (available since version 0.13) gives the user the option to instead get a xarray.Dataset with dimensions y and x, and N data variables (one for each band). The names of the variables are simply band_1, band_2, ..., band_N, which makes sense. You can also find the long_name attribute within each data variable.

I think it would be useful to have another optional keyword argument that renames the data variables to the long_name (description). Currently, this can be done as shown below:

import rioxarray

image = rioxarray.open_rasterio("myMultiBandImage.tif", band_as_variable=True)
image = image.rename({band:image[band].attrs["long_name"] for band in image})

Having an option to do this directly with rioxarray.open_rasterio would be useful:

image = rioxarray.open_rasterio("myMultiBandImage.tif", band_as_variable=True, rename_bands=True)

Of course, for backwards compatibility it should be set to False by default and it only makes sense when used with band_as_variable=True.

lopezvoliver avatar Jan 24 '24 11:01 lopezvoliver

https://github.com/corteva/rioxarray/pull/600#discussion_r1011706293

The variable name should only contain alphanumeric characters and underscores. The band description could potentially be a sentence with any characters. This ensures consistency and stability.

snowman2 avatar Jan 24 '24 15:01 snowman2

I don't think this is too terrible for users to do if they have a safe long_name:

image = image.rename({band:image[band].attrs["long_name"] for band in image})

snowman2 avatar Jan 24 '24 15:01 snowman2

Maybe we could just put this in the docs as a tip/example?

RichardScottOZ avatar Feb 17 '24 04:02 RichardScottOZ

@snowman2 - I could probably mine stuff I have done for generic examples if a general writing rasters [or reading] notebook is useful.

e.g. this is what it looks like as an ERS grid, this with LZW compression, or whatever else.

RichardScottOZ avatar Feb 18 '24 00:02 RichardScottOZ

e.g. maybe extending something like this: https://corteva.github.io/rioxarray/html/examples/convert_to_raster.html

for things that I would have liked to see when I first came across that sort of info many moons ago

RichardScottOZ avatar Feb 18 '24 00:02 RichardScottOZ

Those documentation contributions would be great 👍

snowman2 avatar Feb 18 '24 01:02 snowman2

Ok, will see what I can do shortly!

RichardScottOZ avatar Mar 03 '24 04:03 RichardScottOZ

Some things like this? https://github.com/corteva/rioxarray/pull/753

RichardScottOZ avatar Mar 03 '24 05:03 RichardScottOZ

I don't think this is too terrible for users to do if they have a safe long_name:

image = image.rename({band:image[band].attrs["long_name"] for band in image})

While that is totally true, I think it is still a bit confusing that one doesn't end up with the same dataset after a roundtrip (xarray.Dataset->geotif->xarray.Dataset). When writing a dataset to a geotiff with rioxarray, the data variable names are written out as the band descriptions. So when reading the geotiff back in, it would be consequent to use the descriptions as data variable names.

In that sense, we also could give the behavior of open_rasterio() with band_as_variable=False a second thought. In this case, you get back a DataArray with a long_name attribute that contains a tuple with the band descriptions. The tuple, of course, has the same length as the band dimension and could/should be considered the 3rd dimension's coordinates.

I don't want to derail this too much, but am curious about your thoughts here: Since geotiffs don't really have the notion of multiple datasets (but just 'bands' / 'channel' / a 3rd dimension), I am wondering if writing a dataset to a geotiff even makes conceptual sense. Possibly more consequent would be to allow writing only 2D or 3D DataArrays. In the latter case, the 3rd dimension's coordinates could/should be used as the band descriptions. Right now, when writing a 3D DataArray to a geotiff, the 3rd dimension's coordinates are discarded.

NiklasPhabian avatar Jul 11 '24 23:07 NiklasPhabian