rioxarray icon indicating copy to clipboard operation
rioxarray copied to clipboard

rioxarray.write_transform() not changing transform

Open nilsleh opened this issue 2 years ago • 10 comments

Problem description

I am working with CMIP6 data and looking at utilizing rioxarray for possibly combining different data streams, so I am looking for a general approach through rioxarray. For this am using the merge_arrays() function, with the hope of applying it to not just for these specific CMIP6 files. However, for these files I am encountering rasterio.errors.WindowError: Bounds and transform are inconsistent Errors which seem to be due to a wrong transformation being read from the src files. The output_transform is computed correctly and works, and therefore I was thinking about setting the transform beforehand to the array. However, write_transform() does not change the transformation in this case, so I am wondering what I am missing or if there is another way to go about this. Data for the following example can be found here.

Code Sample

import xarray as xr
from rasterio.transform import Affine
from rioxarray.merge import merge_arrays

output_transform = Affine(1.0, 0.0, -0.5, 0.0, -1.0, 329.5)

arr_one = xr.open_dataset("CanESM5_r1i1p1f1_tos.nc")["tos"].squeeze()
arr_one.rio.set_spatial_dims("i", "j", inplace=True)

print(f"Before: {arr_one.rio.transform()}")
arr_one.rio.write_transform(output_transform, inplace=True)
print(f"After: {arr_one.rio.transform()}")

arr_two = xr.open_dataset("CanESM5_r1i1p1f1_zos.nc")["zos"].squeeze()
arr_two.rio.set_spatial_dims("i", "j", inplace=True)
arr_two.rio.write_transform(output_transform, inplace=True)

output = merge_arrays([arr_one, arr_two])

Environment Information

rioxarray - 0.15.0 rasterio - 1.3.6 xarray - 2023.6.0

nilsleh avatar Sep 13 '23 15:09 nilsleh

@snowman2 can you comment on this issue? We would like to add support for HDF5/NetCDF to TorchGeo and rioxarray seems to be the correct way to simulate a rasterio file but we're stuck on the above example. We're not sure if we're doing something wrong or this is a bug in rioxarray.

adamjstewart avatar Nov 09 '23 14:11 adamjstewart

Inspecting the file: image

It appears that the dimensions i and j don't contain the geospatial coordinate information and instead contain the indexes of the grid. This is problematic as those coordinates are used to re-calculate the resolution/transform when the exist even if the transform has been written as it is usually more reliable (this happens quite a bit in the rioxarray code). For best results, I recommend replacing those values with the correct x & y geospatial coordinate arrays.

snowman2 avatar Nov 09 '23 15:11 snowman2

Side note: rio.write_transform is helpful to add the transform when the coordinate arrays do not exist (to save disk space).

snowman2 avatar Nov 09 '23 15:11 snowman2

image

snowman2 avatar Nov 09 '23 15:11 snowman2

This is how rioxarray adds coordinates based on the transform: https://github.com/corteva/rioxarray/blob/c15b86061feff8c2c7b0964f19922a3154a85f1a/rioxarray/_io.py#L1203-L1206

snowman2 avatar Nov 09 '23 15:11 snowman2

Thanks for the suggestion. For the zip file previously attached, the case seems to be that the defined coordinate grid from longitude(i,j) is not equally spaced for the coordinates. For example, inspecting arr_one.longitude.data[:,0]. Thus I am not sure how replacing the i, j coordinates with an appropriate geospatial coordinate 1d arrays would go and it seems that that is a separate issue.

So apart from that I included some additional data here where the coordinate issue is not of concern, which might help to get more closely to understanding the actual transform error rasterio.errors.WindowError: Bounds and transform are inconsistent.

As background information, for the torchgeo dataset implementation we are trying to achieve the following:

  • support different data streams or sources that are .nc or .hdf5 file (i.e. CMIP data etc.)
  • for a particular region of interest or bounding box, extract the respective area for each data variable of interest and combine the array by merging them spatially
import xarray as xr
from rioxarray.merge import merge_arrays


query = {
    "minx": 0,
    "miny": 0,
    "maxx": 50,
    "maxy": 50
}

_crs = "EPSG:4326"
spatial_x_name = "longitude"
spatial_y_name = "latitude"

data_variables = ["sst", "sla"]

alt_ds= xr.open_dataset("/eo/dt_global_twosat_phy_l4_199311_vDT2021-M01.nc")
alt_ds = alt_ds.assign_coords(longitude=(alt_ds.longitude % 360 - 180)).roll(
        longitude=(alt_ds.dims["longitude"] // 2)
    )
sst_ds = xr.open_dataset("/eo/HadISST1_SST_update.nc")
sst_ds = sst_ds.sortby("latitude", ascending=True)

data_arrays = []
for ds in [sst_ds, alt_ds]:
    ds.rio.set_spatial_dims(spatial_x_name, spatial_y_name, inplace=True)

    for variable in data_variables:
        if hasattr(ds, variable):
            da = ds[variable]
            if not da.rio.crs:
                da.rio.write_crs(_crs, inplace=True)
            elif da.rio.crs != _crs:
                da = da.rio.reproject(_crs)
            # clip box ignores time dimension
            clipped = da.rio.clip_box(
                minx=query["minx"], miny=query["miny"], maxx=query["maxx"], maxy=query["maxy"]
            )
            # rioxarray expects this order
            clipped = clipped.transpose("time", spatial_y_name, spatial_x_name, ...)

            data_arrays.append(clipped.squeeze())


merged_data = merge_arrays(
    data_arrays, bounds=(query["minx"], query["miny"], query["maxx"], query["maxy"])
).data

Running that snippet I get:

ile "/home/nils/miniconda3/envs/torchEnv/lib/python3.9/site-packages/rioxarray/merge.py", line 175, in merge_arrays
    merged_data, merged_transform = _rio_merge(
  File "/home/nils/miniconda3/envs/torchEnv/lib/python3.9/site-packages/rasterio/merge.py", line 341, in merge
    src_window = windows.from_bounds(int_w, int_s, int_e, int_n, src.transform)
  File "/home/nils/miniconda3/envs/torchEnv/lib/python3.9/site-packages/rasterio/windows.py", line 323, in from_bounds
    raise WindowError("Bounds and transform are inconsistent")
rasterio.errors.WindowError: Bounds and transform are inconsistent

So I think there is a mismatch between how the bounds and transforms are defined vs how they are expected in rioxarray and I am not sure where exactly this difference is.

nilsleh avatar Nov 13 '23 14:11 nilsleh

#209 may be helpful.

snowman2 avatar Nov 13 '23 14:11 snowman2

@snowman2 thanks for the hint! We'll see if that addresses the coordinate issue in the first paragraph of https://github.com/corteva/rioxarray/issues/698#issuecomment-1808219206. Do you have any thoughts on the other paragraphs of that comment?

adamjstewart avatar Nov 14 '23 13:11 adamjstewart