distogram icon indicating copy to clipboard operation
distogram copied to clipboard

Numba support

Open MainRo opened this issue 4 years ago • 13 comments

From belm0:

Performance wise, I consider pypy as the only option today on a real application.

I'm not sure what this means. Pypy has limitations (it is not a 1:1 replacement for CPython), and there are legacy applications which cannot transition to pypy easily, or which are heavily dependent on numpy. For such applications, a numpy + numba implementation is useful. Having such an implementation does not preclude having a pure Python implementation which supports Pypy. They can exist along side each other.

I do not want to use numpy because on a streaming application it will never allow CPython to close the gap with pypy, an numpy just breaks pypy's jit. However I am interested in numba support,

A numba-only implementation will not perform, because numba does not support fast mode with Python arrays. The only way to get performance on this algorithm with numba is via numpy arrays.

I adapted the distogram bench to streamhist to compare the update function. On CPython distogram is 25% faster, and on pypy distogram is 13 times faster.

I measured my pure Python implementation (no numa or numpy) vs. distogram. It is 20% faster (and less code, but I didn't compare closely). The implementation uses a "maintain cost function array" approach just as distogram does. So distogram appears to have some room for improvement.

My numba+numpy implementation is 20x faster than streamhist (with 64 bins).

MainRo avatar Jun 21 '20 13:06 MainRo