geocube icon indicating copy to clipboard operation
geocube copied to clipboard

DOC: Can I perform operations on the geocube pixels during rasterization? Count, Mean, etc.

Open bw4sz opened this issue 4 years ago • 11 comments

Hi this is my first foray into geocube. Thanks for your work here, clearly a good idea.

I'd like to add an example usage that I believe should be possible, but not just quite getting right.

Is it possible to count the number of polygons in a cell, or perform operations on the geocube pixels during rasterization? Imagine a geopandas dataframe with many polygons. For each cell, perform some numpy operation (like rasterstats does) on the incoming polygons. I can perform pandas like operation after creating a geocube, but it doesn't preserve the spatial object. I feel like the group_by argument is in this direction, but could not succeed.

Something like

#Turn each set of predictions into a raster
import geopandas as gpd
from shapely.geometry import Polygon
from geocube.api.core import make_geocube

#a numeric column to count
g["mask"] = 1

cube = make_geocube(vector_data=g, resolution=(-50, 50), function=mask.count())

To create a heatmap of number of polygons in each cell.

Here is a notebook to help illustrate use case (sorry the geocube svg doesn't play well with ipython on github).

https://github.com/weecology/NEON_crown_maps/blob/master/notebooks/Create%20a%20raster%20of%20polygon%20counts.ipynb

bw4sz avatar Jun 18 '20 20:06 bw4sz

I think it would go right here

https://github.com/corteva/geocube/blob/2b42f470255e79eb9b7f94100a33c1b0c08fcccf/geocube/vector_to_cube.py#L156

bw4sz avatar Jun 18 '20 20:06 bw4sz

Should probably add better documentation around this, but you can pass in a custom rasterization function (examle in docs):

from functools import partial

from rasterio.enums import MergeAlg
from geocube.rasterize import rasterize_image
from geocube.api.core import make_geocube

cube = make_geocube(
...
    rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add),
    fill=0,
)

With this method, each cell should contain the count of the number of times a polygon overlapped the cell.

snowman2 avatar Jun 18 '20 21:06 snowman2

Awesome, i'll try to mock something up and submit a PR with a notebook to docs.

On Thu, Jun 18, 2020 at 2:20 PM Alan D. Snow [email protected] wrote:

Should probably add better documentation around this, but you can pass in a custom rasterization function (examle in docs https://corteva.github.io/geocube/stable/examples/timestamp_missing_data.html ):

from functools import partial from rasterio.enums import MergeAlgfrom geocube.rasterize import rasterize_imagefrom geocube.api.core import make_geocube cube = make_geocube( ... rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add) )

With this method, each cell should contain the count of the number of times a polygon overlapped the cell.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/corteva/geocube/issues/28#issuecomment-646311502, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJHBLBHZIVSM7XR55M6FSLRXKAJHANCNFSM4OCEMGIA .

-- Ben Weinstein, Ph.D. Postdoctoral Fellow University of Florida http://benweinstein.weebly.com/

bw4sz avatar Jun 18 '20 21:06 bw4sz

I'm still debugging, i'll update when I fix.

from functools import partial
from rasterio.enums import MergeAlg
from geocube.rasterize import rasterize_image

g =
gpd.read_file("/Users/ben/Dropbox/Weecology/Crowns/examples/2019_BART_5_320000_4881000_image.shp")
g["mask"]=1
cube = make_geocube(vector_data=g, resolution=(100, 100),
 measurements=['mask'], rasterize_function=partial(rasterize_image,
merge_alg=MergeAlg.add))
cube.plot()

isn't quite right, the cube is blank

cube
<xarray.Dataset>
Dimensions:      (x: 10, y: 10)
Coordinates:
  * y            (y) float64 4.881e+06 4.881e+06 ... 4.882e+06 4.882e+06
  * x            (x) float64 3.200e+05 3.202e+05 ... 3.208e+05 3.21e+05
    spatial_ref  int64 0
Data variables:
    left         (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
    bottom       (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
    right        (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
    top          (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
    height       (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
    area         (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
    mask         (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
Attributes:
    grid_mapping:  spatial_ref

cube.plot()
Traceback (most recent call last):
  Python Shell, prompt 30, line 1
  File
"/Users/ben/miniconda3/envs/crowns/lib/python3.7/site-packages/xarray/plot/dataset_plot.py",
line 162, in __call__
    "Dataset.plot cannot be called directly. Use "
builtins.ValueError: Dataset.plot cannot be called directly. Use an
explicit plot method, e.g. ds.plot.scatter(...)

or specifying measurements=['mask']

cube.mask
<xarray.DataArray 'mask' (y: 10, x: 10)>
array([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])
Coordinates:
  * y            (y) float64 4.881e+06 4.881e+06 ... 4.882e+06 4.882e+06
  * x            (x) float64 3.200e+05 3.202e+05 ... 3.208e+05 3.21e+05
    spatial_ref  int64 0
Attributes:
    name:          mask
    long_name:     mask
    grid_mapping:  spatial_ref
    _FillValue:    nan

bw4sz avatar Jun 18 '20 21:06 bw4sz

Try using fill=0

snowman2 avatar Jun 19 '20 00:06 snowman2

I wasn't sure where to place, but tried both, no change.

g = gpd.read_file("/Users/ben/Dropbox/Weecology/Crowns/examples/2019_BART_5_320000_4881000_image.shp")
g["mask"]=1
cube = make_geocube(vector_data=g, resolution=(100, 100), measurements=['mask'] , rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add, fill=0))
cube.plot()
cube.rio.to_raster("/Users/Ben/Desktop/data1.tif")
Traceback (most recent call last):
  Python Shell, prompt 41, line 1
  File "/Users/ben/miniconda3/envs/crowns/lib/python3.7/site-packages/xarray/plot/dataset_plot.py", line 162, in __call__
    "Dataset.plot cannot be called directly. Use "
builtins.ValueError: Dataset.plot cannot be called directly. Use an explicit plot method, e.g. ds.plot.scatter(...)
cube
<xarray.Dataset>
Dimensions:      (x: 10, y: 10)
Coordinates:
  * y            (y) float64 4.881e+06 4.881e+06 ... 4.882e+06 4.882e+06
  * x            (x) float64 3.200e+05 3.202e+05 ... 3.208e+05 3.21e+05
    spatial_ref  int64 0
Data variables:
    mask         (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan nan
Attributes:
    grid_mapping:  spatial_ref
g = gpd.read_file("/Users/ben/Dropbox/Weecology/Crowns/examples/2019_BART_5_320000_4881000_image.shp")
g["mask"]=1
cube = make_geocube(vector_data=g, resolution=(100, 100), measurements=['mask'] , fill=0, rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add))
cube.plot()
cube

/Users/ben/miniconda3/envs/crowns/lib/python3.7/site-packages/datacube/utils/geometry/_base.py:301: DeprecationWarning: Please use `str(crs)` instead of `crs.crs_str`
  warnings.warn("Please use `str(crs)` instead of `crs.crs_str`", category=DeprecationWarning)
Traceback (most recent call last):
  Python Shell, prompt 43, line 1
  File "/Users/ben/miniconda3/envs/crowns/lib/python3.7/site-packages/xarray/plot/dataset_plot.py", line 162, in __call__
    "Dataset.plot cannot be called directly. Use "
builtins.ValueError: Dataset.plot cannot be called directly. Use an explicit plot method, e.g. ds.plot.scatter(...)

here is the data link if its useful. I also tried removing the measurements arg. No change. Not 100%, but just using the plot error as an indicator of success.

https://www.dropbox.com/s/jwxlldua58n82du/2018_BART_4_320000_4881000_image.shp?dl=0

bw4sz avatar Jun 19 '20 03:06 bw4sz

The second one is correct. I suggest doing cube.mask.plot()

snowman2 avatar Jun 19 '20 03:06 snowman2

yup. that did it. I'm away for the weekend, but i'll try to fork and submit a notebook PR to add as a resource on monday. Thanks.

On Thu, Jun 18, 2020 at 8:12 PM Alan D. Snow [email protected] wrote:

The second one is correct. I suggest doing cube.mask.plot()

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/corteva/geocube/issues/28#issuecomment-646411054, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJHBLB6QF7EZKAYMWMTV5DRXLJSNANCNFSM4OCEMGIA .

-- Ben Weinstein, Ph.D. Postdoctoral Fellow University of Florida http://benweinstein.weebly.com/

bw4sz avatar Jun 19 '20 03:06 bw4sz

I was trying to do something similar, with the polygons of all cities in the world, and I would expect to get a count of cities in each cell, but it looks like I only get 0/1 values. Here is an example notebook:

heatmap.ipynb.gz

davidbrochart avatar Jul 28 '22 15:07 davidbrochart

Fixed by passing all_touched=True:

rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add, all_touched=True)

davidbrochart avatar Jul 29 '22 08:07 davidbrochart

Thanks @snowman2 for your work here and others for asking questions. This helped me this evening.

Not to myself to make a PR tomorrow to add "Examples - count points in grid cells" to https://corteva.github.io/geocube/stable/examples/examples.html

raybellwaves avatar Aug 24 '23 01:08 raybellwaves