[PR Proposal] Add support for min_resolution in compactCells
Hi, I'm proposing to add an optional minimum resolution to the compactcells function. Currently the function always tries to compact up to resolution 0. For database joins where range queries are inefficient we need to join on equality at an intermediary resolution then post filter for exact min/max checks at a higher resolution.
For example in this databricks article they do a join using parent resolution 8 but this requires that no cells got compacted to a lower resolution than 8.
SELECT
FROM
trip_h3 as t,
(SELECT
h3_toparent(c_cell, 8) AS cell_8, *
FROM taxi_zone_h3c_explode) AS tz
WHERE
(tz.h3_res = 8 AND t.pickup_cell_8 = tz.cell_8) OR
(t.pickup_cell_8 = tz.cell_8 AND h3_ischildof(t.pickup_cell, tz.c_cell))
(sql query from the article)
If we want to "limit" the minimum resolution of the compacted cells currently we'd have to do something like
cells = polygonToCells(poly, xxxx)
compactedCells = compactCells(cells)
compacteCellsWithMinResolution = flatMap(compactedCells, lambda cell : if (getResolution(cell) < 8) {cellToChildren(cell, 8) else {cell})
(excuse my pseudocode :) )
I think it would be a pretty minor change as the compact Cells function just hardcodes 0 as the min resolution right now. We could allow end users to specify their required min resolition.
Happy to contribute this myself as my first PR if you're open the feature.
Here's the python code snippet instead of my pseudocode
import itertools
import h3.api.numpy_int as h3
...
tiled = h3.h3shape_to_cells(shape, max_resolution).tolist()
compacted = h3.compact_cells(tiled)
compacted_min_resolution = list(itertools.chain.from_iterable(
[h3.cell_to_children(cell, min_resolution) if h3.get_resolution(cell) < min_resolution else [cell] for cell in compacted]
))
I was hoping @isaacbrodsky was going to chime in on this. My personal opinion is that it's a good idea, but it should be a new function so we don't have to major-version-bump the C library again. Perhaps partialCompactCells(cell_set, compacted_set, num_cells, min_resolution)
This makes sense to me. I get the utility of not needing to chain a bunch of operations together at the end.
I like the suggestion from @dfellis to add this as a new function. The existing function can naturally be implemented in terms of the new function.