geocube
geocube copied to clipboard
DOC: Can I perform operations on the geocube pixels during rasterization? Count, Mean, etc.
Hi this is my first foray into geocube. Thanks for your work here, clearly a good idea.
I'd like to add an example usage that I believe should be possible, but not just quite getting right.
Is it possible to count the number of polygons in a cell, or perform operations on the geocube pixels during rasterization? Imagine a geopandas dataframe with many polygons. For each cell, perform some numpy operation (like rasterstats does) on the incoming polygons. I can perform pandas like operation after creating a geocube, but it doesn't preserve the spatial object. I feel like the group_by argument is in this direction, but could not succeed.
Something like
#Turn each set of predictions into a raster
import geopandas as gpd
from shapely.geometry import Polygon
from geocube.api.core import make_geocube
#a numeric column to count
g["mask"] = 1
cube = make_geocube(vector_data=g, resolution=(-50, 50), function=mask.count())
To create a heatmap of number of polygons in each cell.
Here is a notebook to help illustrate use case (sorry the geocube svg doesn't play well with ipython on github).
https://github.com/weecology/NEON_crown_maps/blob/master/notebooks/Create%20a%20raster%20of%20polygon%20counts.ipynb
I think it would go right here
https://github.com/corteva/geocube/blob/2b42f470255e79eb9b7f94100a33c1b0c08fcccf/geocube/vector_to_cube.py#L156
Should probably add better documentation around this, but you can pass in a custom rasterization function (examle in docs):
from functools import partial
from rasterio.enums import MergeAlg
from geocube.rasterize import rasterize_image
from geocube.api.core import make_geocube
cube = make_geocube(
...
rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add),
fill=0,
)
With this method, each cell should contain the count of the number of times a polygon overlapped the cell.
Awesome, i'll try to mock something up and submit a PR with a notebook to docs.
On Thu, Jun 18, 2020 at 2:20 PM Alan D. Snow [email protected] wrote:
Should probably add better documentation around this, but you can pass in a custom rasterization function (examle in docs https://corteva.github.io/geocube/stable/examples/timestamp_missing_data.html ):
from functools import partial from rasterio.enums import MergeAlgfrom geocube.rasterize import rasterize_imagefrom geocube.api.core import make_geocube cube = make_geocube( ... rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add) )
With this method, each cell should contain the count of the number of times a polygon overlapped the cell.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/corteva/geocube/issues/28#issuecomment-646311502, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJHBLBHZIVSM7XR55M6FSLRXKAJHANCNFSM4OCEMGIA .
-- Ben Weinstein, Ph.D. Postdoctoral Fellow University of Florida http://benweinstein.weebly.com/
I'm still debugging, i'll update when I fix.
from functools import partial
from rasterio.enums import MergeAlg
from geocube.rasterize import rasterize_image
g =
gpd.read_file("/Users/ben/Dropbox/Weecology/Crowns/examples/2019_BART_5_320000_4881000_image.shp")
g["mask"]=1
cube = make_geocube(vector_data=g, resolution=(100, 100),
measurements=['mask'], rasterize_function=partial(rasterize_image,
merge_alg=MergeAlg.add))
cube.plot()
isn't quite right, the cube is blank
cube
<xarray.Dataset>
Dimensions: (x: 10, y: 10)
Coordinates:
* y (y) float64 4.881e+06 4.881e+06 ... 4.882e+06 4.882e+06
* x (x) float64 3.200e+05 3.202e+05 ... 3.208e+05 3.21e+05
spatial_ref int64 0
Data variables:
left (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
bottom (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
right (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
top (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
height (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
area (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
mask (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
Attributes:
grid_mapping: spatial_ref
cube.plot()
Traceback (most recent call last):
Python Shell, prompt 30, line 1
File
"/Users/ben/miniconda3/envs/crowns/lib/python3.7/site-packages/xarray/plot/dataset_plot.py",
line 162, in __call__
"Dataset.plot cannot be called directly. Use "
builtins.ValueError: Dataset.plot cannot be called directly. Use an
explicit plot method, e.g. ds.plot.scatter(...)
or specifying measurements=['mask']
cube.mask
<xarray.DataArray 'mask' (y: 10, x: 10)>
array([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])
Coordinates:
* y (y) float64 4.881e+06 4.881e+06 ... 4.882e+06 4.882e+06
* x (x) float64 3.200e+05 3.202e+05 ... 3.208e+05 3.21e+05
spatial_ref int64 0
Attributes:
name: mask
long_name: mask
grid_mapping: spatial_ref
_FillValue: nan
Try using fill=0
I wasn't sure where to place, but tried both, no change.
g = gpd.read_file("/Users/ben/Dropbox/Weecology/Crowns/examples/2019_BART_5_320000_4881000_image.shp")
g["mask"]=1
cube = make_geocube(vector_data=g, resolution=(100, 100), measurements=['mask'] , rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add, fill=0))
cube.plot()
cube.rio.to_raster("/Users/Ben/Desktop/data1.tif")
Traceback (most recent call last):
Python Shell, prompt 41, line 1
File "/Users/ben/miniconda3/envs/crowns/lib/python3.7/site-packages/xarray/plot/dataset_plot.py", line 162, in __call__
"Dataset.plot cannot be called directly. Use "
builtins.ValueError: Dataset.plot cannot be called directly. Use an explicit plot method, e.g. ds.plot.scatter(...)
cube
<xarray.Dataset>
Dimensions: (x: 10, y: 10)
Coordinates:
* y (y) float64 4.881e+06 4.881e+06 ... 4.882e+06 4.882e+06
* x (x) float64 3.200e+05 3.202e+05 ... 3.208e+05 3.21e+05
spatial_ref int64 0
Data variables:
mask (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan nan
Attributes:
grid_mapping: spatial_ref
g = gpd.read_file("/Users/ben/Dropbox/Weecology/Crowns/examples/2019_BART_5_320000_4881000_image.shp")
g["mask"]=1
cube = make_geocube(vector_data=g, resolution=(100, 100), measurements=['mask'] , fill=0, rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add))
cube.plot()
cube
/Users/ben/miniconda3/envs/crowns/lib/python3.7/site-packages/datacube/utils/geometry/_base.py:301: DeprecationWarning: Please use `str(crs)` instead of `crs.crs_str`
warnings.warn("Please use `str(crs)` instead of `crs.crs_str`", category=DeprecationWarning)
Traceback (most recent call last):
Python Shell, prompt 43, line 1
File "/Users/ben/miniconda3/envs/crowns/lib/python3.7/site-packages/xarray/plot/dataset_plot.py", line 162, in __call__
"Dataset.plot cannot be called directly. Use "
builtins.ValueError: Dataset.plot cannot be called directly. Use an explicit plot method, e.g. ds.plot.scatter(...)
here is the data link if its useful. I also tried removing the measurements arg. No change. Not 100%, but just using the plot error as an indicator of success.
https://www.dropbox.com/s/jwxlldua58n82du/2018_BART_4_320000_4881000_image.shp?dl=0
The second one is correct. I suggest doing cube.mask.plot()
yup. that did it. I'm away for the weekend, but i'll try to fork and submit a notebook PR to add as a resource on monday. Thanks.
On Thu, Jun 18, 2020 at 8:12 PM Alan D. Snow [email protected] wrote:
The second one is correct. I suggest doing cube.mask.plot()
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/corteva/geocube/issues/28#issuecomment-646411054, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJHBLB6QF7EZKAYMWMTV5DRXLJSNANCNFSM4OCEMGIA .
-- Ben Weinstein, Ph.D. Postdoctoral Fellow University of Florida http://benweinstein.weebly.com/
I was trying to do something similar, with the polygons of all cities in the world, and I would expect to get a count of cities in each cell, but it looks like I only get 0/1 values. Here is an example notebook:
Fixed by passing all_touched=True
:
rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add, all_touched=True)
Thanks @snowman2 for your work here and others for asking questions. This helped me this evening.
Not to myself to make a PR tomorrow to add "Examples - count points in grid cells" to https://corteva.github.io/geocube/stable/examples/examples.html