h3ronpy
h3ronpy copied to clipboard
Reverse (H3 set → raster)
Could this library do the reverse operation for raster conversion? i.e., given a set of H3 indicies (at the same, or perhaps at mixed resolutions), and some property (e.g. elevation), produce a raster output.
Naïvely, this could be through conversion of the H3 set to a consistent resolution, conversion to geo boundaries, and then rasterisation. However I'm particularly thinking about a hypothetically efficient approach that avoids expanding the compacted (mixed-resolution) set of H3 indices to a common resolution before rasterisation (if this is actually possible).
I think one way to approach this would be to find the set of pixels covering the region (extent of the input H3 set), compute each pixel's H3 index at the highest appropriate resolution (the highest resolution cell in the input set), and then for each H3 cell in the input set, find the pixels that intersect using the H3 API. (There'd be undefined behaviour if the input H3 set included overlaps.) This avoids having to uncompact the input set, but whether or not that's actually less efficient than computing the H3 index for each pixel is unclear to me.
I'm not even convinced that this idea makes sense, given H3's heirarchical non-containment (children at >2 resolutions higher than a parent may be entirely outside of the (grand)parent's boundary). But since the conversion raster → H3 set is possible and sensible, I assume the reverse is, too.
At this point this H3->raster conversion has not been implemented.
The naive-approach you mention is already possible using gdal_rasterize
, but - as you mention - may not be as efficient as large number of cells may be created when mixed h3 resolutions are used.
Given a lower h3 resolution R
and a higher resolution R+N
I see no way to generate the R+N
outline of a R
-cell without generating all the child cells. A possible optimization would be to avoid doing this resolution homogenization eagerly - which may create quite large datasets - and instead to it lazily for each cell just when its passed to the rasterization. This could be implemented using the change_cell_resolution from with in rust. Given that, it would still require a part of the rasterization code to be ported to rust as I would like to avoid having the gdal dependency in the rust implementation.
Another way to tackle this while staying in python would be applying h3_set_to_multi_polygon
to the R+N
-children of each cell. This would result in a MultiPolygon geometry containing a single Polygon of the R+N
-outline for each cell and could be done using geopandas and written to a vector dataset afterwards. This dataset could then be used with gdal_rasterize
or rasterio.features.rasterize
to generate the raster. This approach would also have the benefit of not requiring to have all child-cells in memory at once.
Let me know if this helps. I currently do no see me coming around to implement any rasterization within this library, but I am happy to accept a PR.
You're right, of course there's no way not to compute all of the child cells, the best case optimisation is just not to put all of them (complete de-compaction) into memory at once. I'm currently unable to contribute this to this project, since I've never used Rust, but I appreciate the discussion anyway.
I played around with rasterizing dataframes a few days ago - so maybe this is of some help for you. Its not the fastest for large numbers of cells, but it works for the few tests I have done. Rasterization is done using rasterio
. The part of building the child cells is still missing, but could be easily added in the for loop in rasterize_dataframe
without requiring building all un-compacted child cells at once:
import pandas as pd
import typing
import h3.api.numpy_int as h3
import numpy as np
from h3ronpy.util import dataframe_to_geodataframe
from rasterio.transform import from_bounds
from rasterio.features import rasterize
import rasterio
def sample_df() -> pd.DataFrame:
distances = []
h3indexes = []
for i_distance, i_h3indexes in enumerate(h3.hex_range_distances(h3.geo_to_h3(12.3, 43.1, 12) , 30)):
distances.extend(len(i_h3indexes) * [i_distance,])
h3indexes.extend(i_h3indexes)
return pd.DataFrame({
"h3index": h3indexes,
"distance": distances
})
def rasterize_dataframe(df: pd.DataFrame, size: typing.Tuple[int], h3index_column_name="h3index", value_column_name="value", nodata=0):
gdf = dataframe_to_geodataframe(df[[h3index_column_name, value_column_name]], column_name=h3index_column_name)
transform = from_bounds(*gdf.total_bounds, *size)
arr = np.full(size, nodata, dtype=df[value_column_name].dtype)
for row in gdf.iterrows():
rasterize([row[1]["geometry"],], out_shape=size, out=arr, default_value=row[1][value_column_name], fill=nodata, transform=transform)
return arr, transform
def run():
df = sample_df()
df["distance"] = df["distance"].astype(np.uint8)
size=(1000, 1000)
nodata=255
arr, transform = rasterize_dataframe(df, size, value_column_name="distance", nodata=nodata)
assert df['distance'].dtype == arr.dtype
with rasterio.open("test.tif", 'w', driver="GTiff", width=size[0], height=size[1], count=1,
dtype=arr.dtype, nodata=nodata, compress='lzw', transform=transform) as ds:
ds.write(arr, 1)
if __name__=='__main__':
run()
It could even be parallelized by splitting the dataframe into chunks (within rasterize_dataframe
) , which could then be rasterized in sub-processes into an individual numpy array. In the end these numpy arrays could then be merged into a single one.
If this would solve the problem, maybe we can add it to the python-parts of this library.