Support for GeoTIFF
Cate should be able to also read product files which use GeoTIFF format.
Implementation: We'll need some package capable of reading TIFF/GeoTIFF, e.g. GDAL, to read the files and create object that behave similar to xarray.Dataset instances created from netCDF (used by the other CCI gridded products). In the ideal case, we can develop or reuse an xarray backend for the TIFF package.
See
- https://github.com/mapbox/rasterio/issues/576
- https://github.com/pydata/xarray/pull/1070
- https://github.com/pydata/xarray/issues/790
GeoTIFF can now be read from xarray through it's rasterio backend: https://github.com/pydata/xarray/pull/1260
Hi @forman the geoTiff reader is almost ready but i have a question about normalization.
A geo Tiff has usually its own Cartesian projection, some bands values and two dimensions (x,y)
We can create a xArray with dims x,y and all the band set as variables directly reading the tiff. In addition we can also add to the output xArray two coordinates 'lon' and 'lat' from the transformation of the source projection coordinates to the cate compatible [epsg:4326] with the name of 'lon' and 'lat'. However lon and lat cannot be dimension unless we restructure the entire xArray. I mean the two new coordinate values are accessible providing the point (x,y) of the original image, e.g. lon(x,y) and lat(x,y). Here is an example reading the geoTiff from a file and converted into a dataset
Dimensions: (x :2540, y: 3932)
Coordinates:
* y (y) float64 3.5 3.52
* x (x) float64 -1.5 -1.8
Data variables:
1 (y, x) one variable for each band
In addition we can transform the point x,y into a longitude and latitude values adding this data into the dataset itself providing an output with this structure:
Dimensions: (x :2540, y: 3932)
Coordinates:
* y (y) float64 3.5 3.52
* x (x) float64 -1.5 -1.8
lon (y, x) float64 -88.86 -88.86
lat (y, x) float64 -73.1 -73.2
Data variables:
1 (y, x) one variable for each band
lon and lat are now present in the output but they are not dimensions, to normalize the structure in order to allow the mapping on the globe we should provide lon, lat and time as dimensions. Here come the issue, the size of lon is the product of x and y and so the size of lat (the original image could be skewed for example). So if we create dimension with lon and lat, the array size for each band variable will be the square of (x*y) !! the memory required grow up exponentially and the Xarray for each band will be filled by a large number of None value.
we can try to aggregate the similar values of a single dimension in order to reduce the size [ there are a lot or repeated value in lon and lat indeed] but this phase is process consuming and it is function of the image size and number of bands. Following this direction we could generate a fully normalized dataset which could be mapped and used like a normal netCDF dataset otherwise i think is only possible to show the dataset as an image.
Hi @papesci @forman please check the fix has been merged into the master branch