rioxarray
rioxarray copied to clipboard
Handling of DataArrays using longitudes from 0 to 360
I'm currently working with a large set of CF-Compliant data from CORDEX and CMIP5 and attempting to integrate some of the neat features of rioxarray into a project that performs all manner of analyses on xarray-readable data.
I've followed some of the advice needed to add a projection to the netCDF4 file the code that seems to assuage the issues looks similar to:
import rioxarray
import xarray
import geojson
import rasterio.crs
nc = "/some/netcdf/tas.nc"
g = list()
with open("/foo/bar/file.geojson") as gj:
g.append(geojson.load(gj)[0]["geometry"])
ds = xarray.open_dataset(nc)
crs = 4326
crs = rasterio.crs.CRS().from_epsg(crs)
dss = ds.tas
dss = dss.rio.set_spatial_dims(x_dim="lon", y_dim="lat")
dss = dss.rio.write_crs(crs.to_dict())
clipped = dss.rio.clip(g, crs=crs, all_touched=True)
I think I'm running into an issue where although the data is not projected, the values for longitude run from 0-360 starting from Greenwich. Seeing as most climate data seems to follow the "lat_bnds" -> (-90, +90), "lon_bnds" -> (0, +360), I'm wondering whether there's anything formally within rioxarray to deal with this.
A work-around for this is to subtract 360 from all lons > 180, re-assign these new lons to the ds.lon
, then use ds.sortby("lon")
. This seems a bit "hacky" but seems to work for now. Is there a better way?
I'm wondering whether there's anything formally within rioxarray to deal with this.
Unfortunately, we don't currently support that as far as I am aware.
Side note, I would recommend doing this:
dss = dss.rio.write_crs(4326) # option 1
dss = dss.rio.write_crs("epsg:4326") # option 2
The to_dict()
makes PROJ params that can potentially be lossy. See: https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems.
In the project I'm working on, I've managed to figure out a reliable means of dealing with this problem. I can't promise that my approach is pretty but is this a feature that you would be interested in integrating into rioxarray?
Sounds useful. I have had to do this I in the past. Here it is for reference: https://github.com/snowman2/pangaea/blob/a304e9a489cfc0bc1c74e7cb50c3335a4f3d596f/pangaea/xlsm.py#L297-L298
If you would like to submit a PR, I think it would be useful to have.
I have seen other issues where this would be useful:
- https://gis.stackexchange.com/questions/357246/clip-global-data-by-polygon-using-rioxarray-fails-off-by-180-longitude
- https://gis.stackexchange.com/questions/37790/how-to-reproject-raster-from-0-360-to-180-180-with-cutting-180-meridian
- https://gis.stackexchange.com/questions/348172/changing-projection-from-0-360-to-180-180-for-a-simple-raster
@Zeitsperre is this something you still have interest in implementing?
Maybe something like:
self._obj.assign_coords(
{self.x_dim: (((self._obj[self.x_dim] + 180) % 360) - 180)}
).sortby(self.x_dim)
Hi @snowman2, yes, this functionality is still interesting to us. Thanks for following up and sorry to not get around to contributing something.
We've been building out our version for the last few months. The main need for it is precisely to deal with the first case you identified on the GIS SE. The implementation we have now uses GeoPandas spatialjoin or shapely's vectorized (based on the approach that regionmask
uses) to map out a vector using the grid dimensions from a target xarray object. When the vector overlaps Greenwich (ie: exceeds the bounds of a 0-360 DataArray), that's when we perform the subset and reassign coordinates.
The biggest concern I can see is the speed of performing this for very large multi-file datasets; I imagine it's quicker to treat the shape and clip first then perform the longitude reproject. How would one go about optimizing this do you think?
@Zeitsperre can you provide a simplified code-example of your workflow?
The code needs a refactor as it relies on some decorators performing workarounds to get shapefiles to be more consistent with grid data. For instance, the code below is reliant on a function that establishes whether a shapefile's bounds cross the Greenwich meridian, and if so, splits the shapes that fall along it into multipolygons and applies a value of +360 to all negative lons, as shape operations are less intensive than grid operations.
Given the x_ and y_dimensions of an xarray object (ds_copy) and a shapefile (poly):
if len(x_dim.shape) == 1 & len(y_dim.shape) == 1:
# create a 2d grid of lon, lat values
lon1, lat1 = np.meshgrid(
np.asarray(x_dim.values), np.asarray(y_dim.values), indexing="ij"
)
dims_out = x_dim.dims + y_dim.dims
coords_out = dict()
coords_out[dims_out[0]] = x_dim.values
coords_out[dims_out[1]] = y_dim.values
else:
lon1 = x_dim.values
lat1 = y_dim.values
dims_out = x_dim.dims
coords_out = x_dim.coords
# create pandas Dataframe from NetCDF lat and lon points
df = pd.DataFrame(
{"id": np.arange(0, lon1.size), "lon": lon1.flatten(), "lat": lat1.flatten()}
)
df["Coordinates"] = list(zip(df.lon, df.lat))
df["Coordinates"] = df["Coordinates"].apply(Point)
# create GeoDataFrame (spatially referenced with shifted longitude values if needed).
if wrap_lons:
wgs84 = CRS.from_string(
"+proj=longlat +datum=WGS84 +no_defs +type=crs +lon_wrap=180"
)
gdf_points = gpd.GeoDataFrame(df, geometry="Coordinates", crs=wgs84)
# spatial join geodata points with region polygons and remove duplicates
point_in_poly = gpd.tools.sjoin(gdf_points, poly, how="left", op="intersects")
point_in_poly = point_in_poly.loc[~point_in_poly.index.duplicated(keep="first")]
# extract polygon ids for points
mask = point_in_poly["index_right"]
mask_2d = np.array(mask).reshape(lat1.shape[0], lat1.shape[1])
mask_2d = xarray.DataArray(mask_2d, dims=dims_out, coords=coords_out)
# loop through variables
for v in ds_copy.data_vars:
if set.issubset(set(mask_2d.dims), set(ds_copy[v].dims)):
ds_copy[v] = ds_copy[v].where(mask_2d.notnull())
# Remove coordinates where all values are outside of region mask
for dim in mask_2d.dims:
mask_2d = mask_2d.dropna(dim, how="all")
ds_copy = ds_copy.sel({dim: mask_2d[dim] for dim in mask_2d.dims})
If wrap_lons
is True, the new xarray object will have aWGS84 crs with -180/+180 lons. It's not pretty but it seems to work.
Hmmm, I don't have a suggestion at the moment. Would probably have to get pretty deep into the weeds before I could do so.
FYI there is an example of converting lons to -180 - 180 with xarray: http://xarray.pydata.org/en/stable/generated/xarray.Dataset.assign_coords.html
The .sortby(self.x_dim)
is something that is sometimes necessary for geospatially referencing the data: https://github.com/corteva/rioxarray/issues/58#issuecomment-612062722
Had to do this a couple of weeks ago, past advice I have seen and/or used was chop it in half, so did this:
worldmagright = worldmag.where((worldmag.x >= 0) & (worldmag.x < 180), drop=True)
worldmagleft = worldmag.where((worldmag.x >= 180) & (worldmag.x <= 360), drop=True)
worldmagleft = worldmagleft.assign_coords(x=(worldmagleft.x -360), inplace=True)
worldmagnew = xarray.concat([worldmagleft, worldmagright],"x")
but more things lying around then of course, so an all in one is nicer
but the transform, in the above?