Should the vectorized geo_to_h3 stay as unstable?
#147 introduced vectorized support for geo_to_h3, which sped things up by about 50% over a python for loop. It's currently under unstable, which raises a warning. It seems that geo_to_h3 gives accurate results, and that its API won't retroactively break in the future. @ajfriend do you agree?
Can I suggest that the functionality gets moved to api instead of unstable? To me, the warning from unstable implied that it may not give correct results, which does not seem to be the case.
Maybe we should update the warning, as unstable was meant to imply that the API will probably change in the future. (Happy to accept a PR on this, btw!)
Also, we're planning on a 4.0 release when the core library also moves to 4.0, so many of the function names will change anyway and break backwards compatibility. For 4.0, I would like to settle on how exactly we'd like to implement the vectorized functions. For example, it'd be nice if we could have them work like numpy functions and dispatch correctly to both vector and scalar arguments. (Also happy to accept PRs to play around in the unstable namespace to try out different approaches for this.)
For 4.0, I would like to settle on how exactly we'd like to implement the vectorized functions.
I think 4.0 is coming up pretty soon 😄 . I'd love to get more of the H3 API to have a vectorized counterpart. My current thoughts on this are:
-
Add more
.pxdfiles to prevent duplicating Cython wrapper code for vectorized functions. The existing approach for the few vectorized functions calls intoh3libdirectly. https://github.com/uber/h3-py/blob/034a41e6237a850dec7f984963d87603c9342810/src/h3/_cy/unstable_vect.pyx#L59-L60 This isn't horrible for simple functions likegeo_to_h3, but it's not ideal for other functions that require more Cython wrapper code aroundh3lib. Adding.pxddefinition files (that define headers for our "exported"cpdeffunctions) for our.pyximplementation files will allow us to import those.pyxfiles from other Cython files. (This would also be relevant for https://github.com/uber/h3-py/issues/152).So instead of calling
h3libdirectly, we'd call our own internal wrapper functions:rom .geo import geo_to_h3 .. ith nogil: for i in range(len(lat)): out[i] = geo_to_h3(lat[i], lng[i]) -
Manually defining vectorized functions is unavoidable. Unlike the API functions file in Python, the vectorized functions must be defined in Cython. I assume function reuse won't be possible and these functions will just have to be copied to the vectorized API.
-
Focus on supporting scalar and 1-D numpy data. Pure numpy ufuncs can be used on data of any dimension, but I think the extra complexity of trying to support more than 1D input might not be worth it. I.e.
h3_to_geowith 2D input would require 3D output. -
Add vectorized functions where easiest, first. Functions with uniform input and output dimension are easiest to support. A vectorized
h3_to_geo_boundarywould be harder because I assume the number of tuples returned per h3 cell is not constant. -
Don't shy away from depending on Numpy headers if necessary. If the vectorized Cython code would be a lot simpler to implement by importing Numpy Cython code, we should embrace that. (I'm not currently sure whether depending on Numpy would make things easier). There's surely a way to modify
CMakeLists.txtto compile the vectorized code only if Numpy headers are available, no? And Numpy will easily be available on CI
* Don't shy away from depending on Numpy headers if necessary. If the vectorized Cython code would be a lot simpler to implement by importing Numpy Cython code, we should embrace that. (I'm not currently sure whether depending on Numpy would make things easier). There's surely a way to modify `CMakeLists.txt` to compile the vectorized code only if Numpy headers are available, no? And Numpy will easily be available on CI
This last point has the potential issue of requiring anyone installing h3-py the "normal" way to have numpy installed (unless there's an "optional dependency" concept in Python, too?) It feels like it might break things for users in memory-constrained environments or that have some requirement to continue using an older numpy, etc.
Not sure how big of a problem that is, but one way to avoid it would be to have this repo publish two packages, one that's "pure" h3-py and another that includes the vectorization/numpy/etc stuff?
This last point has the potential issue of requiring anyone installing
h3-pythe "normal" way to have numpy installed (unless there's an "optional dependency" concept in Python, too?)
I don't think so. Not sure if by "normal" you mean via wheels or from source.
- From wheels: I'm pretty sure that a wheel compiled against Numpy headers would not require Numpy at runtime if that file isn't imported. It would be essentially the same as our current
numpy_intAPI where Numpy is an optional dependency, and so Numpy is only required if you importh3.api.numpy_int. - From source: I'm less familiar with
scikit-build, but in a "normal" Cython setup it would be possible to only compile the Numpy Cython files if Numpy headers can be found at compile time. I.e. in my previous experience, in thesetup.pyfile, you would import numpy and find the path to numpy's headers, then pass a list of Cython modules toext_modules. It wouldn't be hard to simply remove a file from what's passed toext_modulesif numpy couldn't be imported. I assumescikit-buildhas similar functionality.
- I'm less familiar with
scikit-build
From a cursory glance at the scikit-build docs it looks like it would be simple to update this CMakeLists.txt to conditionally add a Cython module only if Numpy was found at build time:
find_package(NumPy MODULE)
...
if (NumPy_FOUND)
add_cython_file(unstable_vect)
endif()
Got it, though Python being even less strict on optional dependencies than Node is surprising.
it looks like the v4.0.0b2 pre release has removed unstable.vect.geo_to_h3. the scalar version of the function has been renamed from geo_to_h3 to latlng_to_cell. unclear if there are vectorised versions anywhere, or if this is outside v4.0 scope