Aggregate regions within regionmask
I am opening this issue to keep track of a feature that was discussed in #90: The aggregation of smaller regions into larger ones. One way to implement this for the raster masks is implemented in cmip6_preprocessing (here).
@mathause suggested that this could be implemented on a lower lever, combining the actual polygons:
import regionmask
basins = regionmask.defined_regions.natural_earth.ocean_basins_50
basins_comb = basins.combine(pacific=["p_S", "p_N"])
I think this would be a great feature. Having the flexibility to combine any set of regions would be aweseome. It is also worth thinking about having some 'default aggregations' (like e.g. large ocean basins) available. This could be a cheap way to extend the 'built in' regions of regionmask.
geopandas has a nice feature for that: https://geopandas.org/aggregation_with_dissolve.html?highlight=dissolve
Ah very cool. The only concern I would have is that has been difficult in terms of dependencies, so it might be a risky dependency for regionmask? Could always be optional.
geopandas and cartopy (and thus fiona and gdal) are already required dependencies...
This would kind of require a to_geopandas method (which is anyway a good idea #50). I haven't used dissolve yet - it needs a column with equal values to aggregate over, so this would have to be constructed. But yes might be easier than a dedicated function within regionmask.
I add an example to construct the EUR-28 regions this is done outside of regionmask.
import regionmask
from cartopy.io import shapereader
import geopandas
from shapely.geometry import box
# read natural_earth dataframe
resolution="50m"
category="cultural"
name="admin_0_countries"
shpfilename = shapereader.natural_earth(resolution, category, name)
df = geopandas.read_file(shpfilename, encoding="utf8")
# select countries
# TODO: add all countries
sel = df.NAME.isin(["Germany", "Austria", "France"])
df_EURO = df.loc[sel]
# get rid of overseas territories
# TODO: check box
bbox = geopandas.GeoDataFrame(geometry=[box(-25, 0, 30, 90)])
df_EURO_res = geopandas.overlay(df_EURO, bbox, how="intersection")
# create one MultiPolygon
multipoly = df_EURO_res.unary_union
# create a region
eur28 = regionmask.Regions(
[df_EURO_res.unary_union],
names="Euro-28",
abbrevs="EUR28",
name="EURO-28 based on naturalearth"
)
eur28.plot()
so you would recommend aggregating inside geopandas and not regionmask
regionmask cannot aggregate Polygons - the only way would be to combine the mask e.g.
da.where(mask == 2 | mask == 3)
thus depends a bit on what you want to achieve