rioxarray Handling of DataArrays using longitudes from 0 to 360

Handling of DataArrays using longitudes from 0 to 360

Open Zeitsperre opened this issue 4 years ago • 13 comments

I'm currently working with a large set of CF-Compliant data from CORDEX and CMIP5 and attempting to integrate some of the neat features of rioxarray into a project that performs all manner of analyses on xarray-readable data.

I've followed some of the advice needed to add a projection to the netCDF4 file the code that seems to assuage the issues looks similar to:

import rioxarray
import xarray
import geojson
import rasterio.crs

nc = "/some/netcdf/tas.nc"
g = list()
with open("/foo/bar/file.geojson") as gj:
    g.append(geojson.load(gj)[0]["geometry"])

ds = xarray.open_dataset(nc)
crs = 4326
crs = rasterio.crs.CRS().from_epsg(crs)
dss = ds.tas
dss = dss.rio.set_spatial_dims(x_dim="lon", y_dim="lat")
dss = dss.rio.write_crs(crs.to_dict())
clipped = dss.rio.clip(g, crs=crs, all_touched=True)

I think I'm running into an issue where although the data is not projected, the values for longitude run from 0-360 starting from Greenwich. Seeing as most climate data seems to follow the "lat_bnds" -> (-90, +90), "lon_bnds" -> (0, +360), I'm wondering whether there's anything formally within rioxarray to deal with this.

A work-around for this is to subtract 360 from all lons > 180, re-assign these new lons to the ds.lon, then use ds.sortby("lon"). This seems a bit "hacky" but seems to work for now. Is there a better way?

Nov 01 '19 17:11 Zeitsperre

I'm wondering whether there's anything formally within rioxarray to deal with this.

Unfortunately, we don't currently support that as far as I am aware.

Side note, I would recommend doing this:

dss = dss.rio.write_crs(4326)  # option 1
dss = dss.rio.write_crs("epsg:4326")  # option 2

The to_dict() makes PROJ params that can potentially be lossy. See: https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems.

Nov 01 '19 18:11 snowman2

In the project I'm working on, I've managed to figure out a reliable means of dealing with this problem. I can't promise that my approach is pretty but is this a feature that you would be interested in integrating into rioxarray?

Nov 01 '19 22:11 Zeitsperre

Sounds useful. I have had to do this I in the past. Here it is for reference: https://github.com/snowman2/pangaea/blob/a304e9a489cfc0bc1c74e7cb50c3335a4f3d596f/pangaea/xlsm.py#L297-L298

If you would like to submit a PR, I think it would be useful to have.

Nov 02 '19 12:11 snowman2

I have seen other issues where this would be useful:

https://gis.stackexchange.com/questions/357246/clip-global-data-by-polygon-using-rioxarray-fails-off-by-180-longitude
https://gis.stackexchange.com/questions/37790/how-to-reproject-raster-from-0-360-to-180-180-with-cutting-180-meridian
https://gis.stackexchange.com/questions/348172/changing-projection-from-0-360-to-180-180-for-a-simple-raster

Apr 10 '20 14:04 snowman2

@Zeitsperre is this something you still have interest in implementing?

Apr 10 '20 14:04 snowman2

Maybe something like:

self._obj.assign_coords(
    {self.x_dim: (((self._obj[self.x_dim] + 180) % 360) - 180)}
).sortby(self.x_dim)

Apr 10 '20 14:04 snowman2

Hi @snowman2, yes, this functionality is still interesting to us. Thanks for following up and sorry to not get around to contributing something.

We've been building out our version for the last few months. The main need for it is precisely to deal with the first case you identified on the GIS SE. The implementation we have now uses GeoPandas spatialjoin or shapely's vectorized (based on the approach that regionmask uses) to map out a vector using the grid dimensions from a target xarray object. When the vector overlaps Greenwich (ie: exceeds the bounds of a 0-360 DataArray), that's when we perform the subset and reassign coordinates.

The biggest concern I can see is the speed of performing this for very large multi-file datasets; I imagine it's quicker to treat the shape and clip first then perform the longitude reproject. How would one go about optimizing this do you think?

Apr 14 '20 13:04 Zeitsperre

@Zeitsperre can you provide a simplified code-example of your workflow?

Apr 14 '20 14:04 snowman2

The code needs a refactor as it relies on some decorators performing workarounds to get shapefiles to be more consistent with grid data. For instance, the code below is reliant on a function that establishes whether a shapefile's bounds cross the Greenwich meridian, and if so, splits the shapes that fall along it into multipolygons and applies a value of +360 to all negative lons, as shape operations are less intensive than grid operations.

Given the x_ and y_dimensions of an xarray object (ds_copy) and a shapefile (poly):

    if len(x_dim.shape) == 1 & len(y_dim.shape) == 1:
        # create a 2d grid of lon, lat values
        lon1, lat1 = np.meshgrid(
            np.asarray(x_dim.values), np.asarray(y_dim.values), indexing="ij"
        )
        dims_out = x_dim.dims + y_dim.dims
        coords_out = dict()
        coords_out[dims_out[0]] = x_dim.values
        coords_out[dims_out[1]] = y_dim.values
    else:
        lon1 = x_dim.values
        lat1 = y_dim.values
        dims_out = x_dim.dims
        coords_out = x_dim.coords

    # create pandas Dataframe from NetCDF lat and lon points
    df = pd.DataFrame(
        {"id": np.arange(0, lon1.size), "lon": lon1.flatten(), "lat": lat1.flatten()}
    )
    df["Coordinates"] = list(zip(df.lon, df.lat))
    df["Coordinates"] = df["Coordinates"].apply(Point)

    # create GeoDataFrame (spatially referenced with shifted longitude values if needed).
    if wrap_lons:
        wgs84 = CRS.from_string(
            "+proj=longlat +datum=WGS84 +no_defs +type=crs +lon_wrap=180"
        )
    gdf_points = gpd.GeoDataFrame(df, geometry="Coordinates", crs=wgs84)

    # spatial join geodata points with region polygons and remove duplicates
    point_in_poly = gpd.tools.sjoin(gdf_points, poly, how="left", op="intersects")
    point_in_poly = point_in_poly.loc[~point_in_poly.index.duplicated(keep="first")]

    # extract polygon ids for points
    mask = point_in_poly["index_right"]
    mask_2d = np.array(mask).reshape(lat1.shape[0], lat1.shape[1])
    mask_2d = xarray.DataArray(mask_2d, dims=dims_out, coords=coords_out)

    # loop through variables
    for v in ds_copy.data_vars:
        if set.issubset(set(mask_2d.dims), set(ds_copy[v].dims)):
            ds_copy[v] = ds_copy[v].where(mask_2d.notnull())

    # Remove coordinates where all values are outside of region mask
    for dim in mask_2d.dims:
        mask_2d = mask_2d.dropna(dim, how="all")
    ds_copy = ds_copy.sel({dim: mask_2d[dim] for dim in mask_2d.dims})

If wrap_lons is True, the new xarray object will have aWGS84 crs with -180/+180 lons. It's not pretty but it seems to work.

Apr 14 '20 15:04 Zeitsperre

Hmmm, I don't have a suggestion at the moment. Would probably have to get pretty deep into the weeds before I could do so.

Apr 16 '20 13:04 snowman2

FYI there is an example of converting lons to -180 - 180 with xarray: http://xarray.pydata.org/en/stable/generated/xarray.Dataset.assign_coords.html

Mar 03 '21 02:03 raybellwaves

The .sortby(self.x_dim) is something that is sometimes necessary for geospatially referencing the data: https://github.com/corteva/rioxarray/issues/58#issuecomment-612062722

Mar 03 '21 14:03 snowman2

Had to do this a couple of weeks ago, past advice I have seen and/or used was chop it in half, so did this: worldmagright = worldmag.where((worldmag.x >= 0) & (worldmag.x < 180), drop=True) worldmagleft = worldmag.where((worldmag.x >= 180) & (worldmag.x <= 360), drop=True) worldmagleft = worldmagleft.assign_coords(x=(worldmagleft.x -360), inplace=True) worldmagnew = xarray.concat([worldmagleft, worldmagright],"x")

but more things lying around then of course, so an all in one is nicer

but the transform, in the above?

Mar 08 '21 22:03 RichardScottOZ

rioxarray rioxarray copied to clipboard

Handling of DataArrays using longitudes from 0 to 360

rioxarray
rioxarray copied to clipboard