rioxarray icon indicating copy to clipboard operation
rioxarray copied to clipboard

Unexpected behaviour when modifying coords with `assign_coords`

Open patel-zeel opened this issue 1 year ago • 5 comments

Code Sample, a copy-pastable example if possible

import rioxarray as rxr

da = rxr.open_rasterio("https://huggingface.co/datasets/Zeel/tmp/resolve/main/1524-1184.tif")
# save locally
da.rio.to_raster("tmp.tif")

# Load
da = rxr.open_rasterio("tmp.tif")
print("Original values", da.x.values[:5])
da = da.assign_coords(x = np.round(da.x, 2))
print("Modified values before saving", da.x.values[:5])

# Save
da.rio.to_raster("tmp2.tif")

# Reload
da = rxr.open_rasterio("tmp2.tif")
print("Modified values after saving and reloading", da.x.values[:5])

Output

Original values [9783942.00780707 9783946.78512134 9783951.56243561 9783956.33974987
 9783961.11706414]
Modified values before saving [9783942.01 9783946.79 9783951.56 9783956.34 9783961.12]
Modified values after saving and reloading [9783942.01       9783946.7873138  9783951.5646276  9783956.34194139
 9783961.11925519]

Expected Output

Original values [9783942.00780707 9783946.78512134 9783951.56243561 9783956.33974987
 9783961.11706414]
Modified values before saving [9783942.01 9783946.79 9783951.56 9783956.34 9783961.12]
Modified values after saving and reloading [9783942.01 9783946.79 9783951.56 9783956.34 9783961.12]

Environment Information

Installed fresh in Google colab with pip install rioxarray

Question

If this is not a recommended way to modify the coordinates, please help me with the recommended way.

patel-zeel avatar Nov 06 '24 05:11 patel-zeel

With the change in coordinates, your dx/dy are no longer evenly spaced:

da = rioxarray.open_rasterio("tmp.tif")
print("Original values", da.x.values[:5])
print("DX", da.x.values[:5]-da.x.values[1:6])
da = da.assign_coords(x = numpy.round(da.x, 2))
print("Modified values before saving", da.x.values[:5])
print("DX", da.x.values[:5]-da.x.values[1:6])
Original values [9783942.00780707 9783946.78512134 9783951.56243561 9783956.33974987
 9783961.11706414]
DX [-4.77731427 -4.77731427 -4.77731427 -4.77731427 -4.77731427]
Modified values before saving [9783942.01 9783946.79 9783951.56 9783956.34 9783961.12]
DX [-4.78 -4.77 -4.78 -4.78 -4.77]

After saving the raster, the new coords are again evenly spaced:

# Save
da.rio.to_raster("tmp2.tif")
# Reload
da = rioxarray.open_rasterio("tmp2.tif")
print("Modified values after saving and reloading", da.x.values[:5])
print("DX", da.x.values[:5]-da.x.values[1:6])
Modified values after saving and reloading [9783942.01       9783946.7873138  9783951.5646276  9783956.34194139
 9783961.11925519]
DX [-4.7773138 -4.7773138 -4.7773138 -4.7773138 -4.7773138]

snowman2 avatar Nov 06 '24 20:11 snowman2

Thank you for the response, @snowman2! Now, I understand what's going on. Actually, my use case is like the following:

  • I am trying to merge multiple tif files with xr.open_mfdataset. Their coordinates are similar, but floating-point precision results in a non-monotonic final index. So, I wanted to round the coordinates to make sure all similar coordinates become exactly the same. Is there a better way to achieve the same instead of what I have done above?

patel-zeel avatar Nov 07 '24 04:11 patel-zeel

Is there a better way to achieve the same instead of what I have done above?

I recommend referring to https://github.com/corteva/rioxarray/blob/fa35e916e41d785b0a57e0d5dce6189660b4ae3d/rioxarray/_io.py#L848-L891.

In that code, it only adds coordinates for one of the data arrays and then the other data arrays in the dataset inherit the coordinates.

snowman2 avatar Nov 07 '24 04:11 snowman2

Thank you for the reference, @snowman2, but I couldn't fully understand what you are trying to convey. I have multiple clusters of files where, in each cluster, coordinates are very similar, with just floating point differences. I'd appreciate a lot if you could provide a code/pseudo-code of how to achieve this.

patel-zeel avatar Nov 07 '24 05:11 patel-zeel

this sounds related to https://discourse.pangeo.io/t/example-which-highlights-the-limitations-of-netcdf-style-coordinates-for-large-geospatial-rasters/4140

there's a draft pr in xarray with suggestions on how rioxarray could be changed to not materialize coordinates and introduce floating point imprecision https://github.com/pydata/xarray/pull/9543

rbavery avatar Nov 07 '24 06:11 rbavery