geocube
geocube copied to clipboard
Support dask-geopandas
@snowman2 mentioned using dask-geopandas to rasterize in chunks using Dask. I just tried and it doesn't seem to work out of the box. Here is an example:
from functools import partial
import geopandas
import dask_geopandas
from rasterio.enums import MergeAlg
from geocube.api.core import make_geocube
from geocube.rasterize import rasterize_image
# download: wget https://raw.githubusercontent.com/drei01/geojson-world-cities/master/cities.geojson
df = geopandas.read_file(
"cities.geojson",
crs="epsg:4326",
)
ddf = dask_geopandas.from_geopandas(df, npartitions=4)
ddf["mask"] = 1
geo_grid = make_geocube(
vector_data=ddf,
resolution=(-0.01, 0.01),
measurements=["mask"],
rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add, all_touched=True),
fill=0,
)
Traceback
---------------------------------------------------------------------------
VectorDataError Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 geo_grid = make_geocube(
2 vector_data=ddf,
3 resolution=(-0.05, 0.05),
4 measurements=['mask'],
5 rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add, all_touched=True),
6 fill=0,
7 )
8 geo_grid
File ~/mambaforge/envs/xarray_leaflet/lib/python3.10/site-packages/geocube/api/core.py:99, in make_geocube(vector_data, measurements, datetime_measurements, output_crs, resolution, align, geom, like, fill, group_by, interpolate_na_method, categorical_enums, rasterize_function)
35 """
36 Rasterize vector data into an ``xarray`` object. Each attribute will be a data
37 variable in the :class:`xarray.Dataset`.
(...)
95
96 """
97 geobox_maker = GeoBoxMaker(output_crs, resolution, align, geom, like)
---> 99 return VectorToCube(
100 vector_data=vector_data,
101 geobox_maker=geobox_maker,
102 fill=fill,
103 categorical_enums=categorical_enums,
104 ).make_geocube(
105 measurements=measurements,
106 datetime_measurements=datetime_measurements,
107 group_by=group_by,
108 interpolate_na_method=interpolate_na_method,
109 rasterize_function=rasterize_function,
110 )
File ~/mambaforge/envs/xarray_leaflet/lib/python3.10/site-packages/geocube/vector_to_cube.py:79, in VectorToCube.__init__(self, vector_data, geobox_maker, fill, categorical_enums)
53 def __init__(
54 self,
55 vector_data: Union[str, os.PathLike, geopandas.GeoDataFrame],
(...)
58 categorical_enums: Optional[Dict[str, List]],
59 ):
60 """
61 Initialize the GeoCube class.
62
(...)
77
78 """
---> 79 self._vector_data = load_vector_data(vector_data)
80 self._geobox = geobox_maker.from_vector(self._vector_data)
81 self._grid_coords = affine_to_coords(
82 self._geobox.affine, self._geobox.width, self._geobox.height
83 )
File ~/mambaforge/envs/xarray_leaflet/lib/python3.10/site-packages/geocube/geo_utils/geobox.py:73, in load_vector_data(vector_data)
71 raise VectorDataError("Empty GeoDataFrame.")
72 if "geometry" not in vector_data.columns:
---> 73 raise VectorDataError(
74 "'geometry' column missing. Columns in file: "
75 f"{vector_data.columns.values.tolist()}"
76 )
78 # make sure projection is set
79 if not vector_data.crs:
VectorDataError: 'geometry' column missing. Columns in file: [0, 1, 2]
Sounds like a good idea. A MR adding support for dask-geopandas is welcome.
I question whether this is the best approach as my (admittedly very limited) understanding is that a dask-geopandas dataframe is chunked along an index rather than into pieces of equivalent (spatial) size, so a single large feature could blow things up.
I think that's a fair concern, @jessjaco . I suspect an important preliminary step would be to perform a spatial_shuffle on the dask-geopandas GeoDataframe. I think it's not so much an issue of single large features, but more how your features are distributed over the larger region. Of course a single large feature could still trip you up, but perhaps that's then down to the user to manage as only they would know how they'd want to subdivide their features.