h3-py icon indicating copy to clipboard operation
h3-py copied to clipboard

Discuss typing of Cython API

Open kylebarron opened this issue 3 years ago • 2 comments

Making a new issue to come back to what the Cython API should look like for best performance.

I added a couple commits to show how the vectorized functions could call our internal cython API instead of calling in to h3lib directly. But note this benchmark comparing against the master branch:

Calling internal cython functions (commit 798da86 (#224) )

geo_to_h3 bench
76.5 ms ± 941 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
h3_to_parent bench
6.52 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
h3_get_resolution bench
2.39 ms ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

On master branch (commit https://github.com/uber/h3-py/commit/d1a5a3b9b2add6f970a840f154c51947d312cdd7)

geo_to_h3 bench
75.8 ms ± 611 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
h3_to_parent bench
371 µs ± 52.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
h3_get_resolution bench
188 µs ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

geo_to_h3 is the same speed in each, but h3_to_parent and h3_get_resolution is roughly an order of magnitude on this branch at 798da86 (#224).

I believe the reasons for this is

  1. the Cython geo_to_h3 signature is fully typed, while the parent signature is not. Additionally, the parent function accepts Python, not C types.

    https://github.com/uber/h3-py/blob/d1a5a3b9b2add6f970a840f154c51947d312cdd7/src/h3/_cy/geo.pyx#L17 https://github.com/uber/h3-py/blob/d1a5a3b9b2add6f970a840f154c51947d312cdd7/src/h3/_cy/cells.pyx#L133

  2. The parent and resolution functions do extra checks with check_cell and check_res.

I think it will be valuable in terms of code reuse for the vectorized API to call the internal Cython functions. So to keep performance high, I'd advocate for the Cython API to be fully typed. I.e.:

  • Instead of res = None, use res = -1
  • Create exception messages in Python instead of Cython?
  • Analyze the Cython report to see where it's calling into Python

If we also remove any Python types from Cython, our Cython API can be marked as nogil, which would allow our vectorized API to release the GIL.

Bench script

Note this needs to be saved as an .ipy file and run with ipython bench.ipy

import h3.unstable.vect as vect
import numpy as np

N = 100_000
K = 10

lats, longs = np.random.uniform(0, 90, N), np.random.uniform(0, 90, N)
h3_index = vect.geo_to_h3(lats, longs, 9)

print('geo_to_h3 bench')
%timeit vect.geo_to_h3(lats, longs, 9)

print('h3_to_parent bench')
%timeit vect.h3_to_parent(h3_index, 8)
# 411 µs ± 16.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

print('h3_get_resolution bench')
%timeit vect.h3_get_resolution(h3_index)

Originally posted by @kylebarron in https://github.com/uber/h3-py/issues/224#issuecomment-1105804901

kylebarron avatar Apr 25 '22 16:04 kylebarron

Totally on board with this as we start to expose the Cython API! The appearance of Python types here was a result of the fact that the original API wasn't designed with vectorization in mind.

ajfriend avatar May 01 '22 00:05 ajfriend

In terms completely removing any Python types from the Cython level, I think there's some overlap with https://github.com/uber/h3-py/issues/267 and https://github.com/uber/h3-py/issues/265

ajfriend avatar Jul 24 '22 03:07 ajfriend