Allow dict subclasses in function input
Occasionally it can be helpful to subclass the built-in dict. The geojson package, for example, has a Polygon class that subclasses dict and provides helpers like validating that the contained geometry is valid geojson. However this doesn't work with h3.polyfill:
import h3
from geojson import Polygon
polygon_coords = [[[-9.092559814453125,38.794500078219826],[-9.164314270019531,38.793429729760994],[-9.217529296875,38.76666579487878],[-9.216842651367188,38.68792166352608],[-9.12139892578125,38.70399894245585],[-9.0911865234375,38.74551518488265],[-9.092559814453125,38.794500078219826]]]
polygon = Polygon(polygon_coords)
isinstance(polygon, dict) # True
h3.polyfill(polygon, res=8, geo_json_conformant=True)
# ---------------------------------------------------------------------------
# TypeError Traceback (most recent call last)
# /var/folders/zl/4x9vw27j60n7h41pd6ntb_pc0000gp/T/ipykernel_33807/4145491903.py in <module>
# ----> 1 h3.polyfill(polygon, res=8, geo_json_conformant=True)
# 2
#
# ~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/h3/api/_api_template.py in polyfill(geojson, res, geo_json_conformant)
# 522 unordered collection of H3Cell
# 523 """
# --> 524 mv = _cy.polyfill(geojson, res, geo_json_conformant=geo_json_conformant)
# 525
# 526 return _out_unordered(mv)
#
# TypeError: Argument 'geojson' has incorrect type (expected dict, got Polygon)
Originally I figured h3-py was doing an assert type(polygon) == dict check, that could easily be switched to isinstance(polygon, dict). But upon closer inspection it looks like this error is actually coming from Cython.
https://github.com/uber/h3-py/blob/c1bee7dff01363895f716ca1add92eb252c1f146/src/h3/_cy/geo.pyx#L195
So it looks like Cython requires the input to be a literal dict instead of a subclass.
The only prior work I could find relating to this is: https://github.com/fastavro/fastavro/issues/139
Possible resolutions
-
Mark as
wontfix; users need to coerce to a literaldict. -
Coerce input to
dictinsideh3-pybut before touching Cython. I think we could just change https://github.com/uber/h3-py/blob/c1bee7dff01363895f716ca1add92eb252c1f146/src/h3/api/_api_template.py#L525 tocy.polyfill(dict(geojson), res, geo_json_conformant=geo_json_conformant) -
Change cython function signature. I.e. change https://github.com/uber/h3-py/blob/c1bee7dff01363895f716ca1add92eb252c1f146/src/h3/_cy/geo.pyx#L195 to
ef polyfill(geojson, int res, bool geo_json_conformant=False):and then add an
assert isinstance(geojson, dict)line. Would removing thedictbeforegeojsonaffect performance at all?
Options 2 and 3 could be benchmarked versus baseline to see if the impact is significant either way.
Option 3 feels like the best option, since it wouldn't require a new object construction and Cython is likely doing those assert type(arg) === typename assertions under the hood already so swapping with an isinstance-based assertion ought to be immeasurably different (the vast majority of the time should be in the C code doing the polyfill operation).
Option 2 could also be tested as it's trivial to implement, but I personally wouldn't prioritize that one? You can get weird things like:
dict([]) # {}
dict([1, 2, 3, 4]) # TypeError: cannot convert dictionary update sequence element #0 to a sequence
dict([(1, 2), (3, 4)]) # {1: 2, 3: 4}
Just becomes a very unpredictable API if you try to convert the input to a dictionary.