pygmt
pygmt copied to clipboard
Integrate with rioxarray and implement GMT_IMAGE to read RGB GeoTIFFs
Description of the desired feature
This is a parallel issue to #578 and #1442. While researching how to read GeoTIFFs into an xarray.DataArray
, specifically GeoTIFFs with RGB bands stored in a single band but with a color lookup table, I discovered that there wasn't really a straightforward solution :sweat_smile: So this issue is mainly to jot down some notes.
1) Reading GeoTiff into xarray.DataArray
Method A: rioxarray.open_rasterio
Pros: Reads in data with proper dtype (e.g. uint8) and proper spatial_ref
Cons: rioxarray
is a bit of a heavy dependency as it requires scipy
(edit: may be resolved with https://github.com/corteva/rioxarray/issues/413, edit2: rioxarray=0.8.0
has made scipy
optional!)
import pygmt
import rioxarray
filename_or_obj = pygmt.which(fname="@earth_day_01d", download="a")
with rioxarray.open_rasterio(filename_or_obj) as dataarray:
print(dataarray)
gives
<xarray.DataArray (band: 1, y: 180, x: 360)>
array([[[251, 251, ..., 251, 251],
[251, 252, ..., 251, 252],
...,
[ 68, 16, ..., 68, 217],
[149, 149, ..., 149, 16]]], dtype=uint8)
Coordinates:
* band (band) int64 1
* x (x) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
* y (y) float64 89.5 88.5 87.5 86.5 ... -86.5 -87.5 -88.5 -89.5
spatial_ref int64 0
Attributes:
scale_factor: 1.0
add_offset: 0.0
Method B: xr.open_dataarray(..., engine="rasterio")
Pros: ~~Requires only rasterio
(a Pythonic GDAL wrapper) to be installed, which keeps things lightweight~~ Easier to put into pygmt.load_dataarray
as it only requires the 'engine' argument
Cons: Seems to read in data as float32 by default
Note: There was an xr.open_rasterio
function in xarray v0.19.0, but it is pending deprecation, see https://github.com/pydata/xarray/issues/4697, and removal PR at https://github.com/pydata/xarray/pull/5808.
with xr.open_dataarray(filename_or_obj) as dataarray:
print(dataarray)
produces
<xarray.DataArray 'band_data' (band: 1, y: 180, x: 360)>
array([[[251., 251., ..., 251., 251.],
[251., 252., ..., 251., 252.],
...,
[ 68., 16., ..., 68., 217.],
[149., 149., ..., 149., 16.]]], dtype=float32)
Coordinates:
* band (band) int64 1
* x (x) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
* y (y) float64 89.5 88.5 87.5 86.5 ... -86.5 -87.5 -88.5 -89.5
spatial_ref int64 ...
TLDR My preference is to use (A) rioxarray.open_rasterio
because it reads in data properly as uint8, with a fallback to (B) xr.open_dataarray(..., engine=rasterio)
if only rasterio
is installed.
2) Reading RGB colors from a GeoTIFF
Here's where things get a bit tricky. Neither rioxarray.open_rasterio
nor xarray.open_dataarray
reads in GeoTIFFs with color interpretation, i.e. RGB colors stored in a GeoTIFF will be read as a single band rather than 3 separate bands! This includes the @earth_day_01d
and @earth_night_01d
imagery. To be clear, non-color GeoTIFFs (e.g. DEMs, single band Landsat/Sentinel-2 images) are fine.
Good news is that the color interpretation mechanism is present in rasterio
see https://github.com/mapbox/rasterio/blob/1.2.8/docs/topics/color.rst and https://github.com/mapbox/rasterio/issues/100. This is wrapping around the GDAL Color Table stuff (https://gdal.org/user/raster_data_model.html#color-table)
Bad news is that we'll need to have that mechanism in rioxarray
or xarray
in order to use it (unless we want to go with a pure rasterio
route). There is an issue open though at https://github.com/corteva/rioxarray/issues/191, so it might be a matter of helping to open a Pull Request in those libraries to implement the feature.
3) Implementing GMT_IMAGE in PyGMT
So once we've solved (1) and (2) above, the next step would be, how to pass that 3-band RGB color xarray.DataArray
into GMT? Max mentioned at https://github.com/GenericMappingTools/pygmt/issues/578#issuecomment-890355308 that we might need to use the GMT_IMAGE
resource, but that hasn't been added into pygmt.clib as of PyGMT v0.4.1.
Probably need something like a virtualfile_from_image
function, similar to virtualfile_from_grid
(done in #159) at https://github.com/GenericMappingTools/pygmt/blame/v0.4.1/pygmt/clib/session.py#L1278
Are you willing to help implement and maintain this feature? Something to do for PyGMT v0.6.0 I think.
The steps I think would be to:
- [x] Add
rioxarray
and/orrasterio
as an optional dependency for PyGMT (added in #2394) - [ ] Contribute to
rioxarray
to get the color interpretation working (https://github.com/corteva/rioxarray/pull/414) - [ ] Modify
pygmt.load_dataarray
function to handle reading in RGB GeoTIFFs - [x] Implement
GMT_IMAGE
in PyGMT (#3338, #3128) - [ ] maybe some other stuff too
References:
- GDAL Raster Data Model https://gdal.org/user/raster_data_model.html#color-table
- OGC GeoTIFF Standard https://www.ogc.org/standards/geotiff
- TIFF standard, see Section 5: Palette-color Images https://www.adobe.io/content/dam/udp/en/open/standards/tiff/TIFF6.pdf
Side note: Both A & B require rioxarray
ref.
Thanks for chiming in @snowman2! I see you've opened an issue with making scipy
optional in rioxarray
at https://github.com/corteva/rioxarray/issues/413, that will really help slim things down since SciPy is a ~30MB dependency by itself.
Side note: Both A & B require
rioxarray
ref.
I stand corrected, it's only the xr.open_rasterio
function which requires rasterio
, but that looks to be deprecated soon following https://github.com/pydata/xarray/pull/5808 so we won't use that.
@joa-quim how do you link the data to the GMT_IS_IMAGE
family in GMT.jl? Is this done via GMT_Put_Matrix
in the same way as for GMT_IS_GRID
?
I don't use GMT_Put_Matrix
for grids (well maybe sometimes under certain circumstances but don't remember now). What I do for all types is to wrap the GMT type interface for each of them. So for images I do this, which wraps an image type. Images and grids differ for very little but aren't equal either. Again, interfacing with GMT native types: grids, images, datasets, cpts, ps,
etc... is how we do the IO business in MEX and Julia, which has the further advantage of not requiring any writing/reading of temporary files.
That makes sense, thanks for the explanation and sharing the link to the code.