distogram
distogram copied to clipboard
Numba support
From belm0:
Performance wise, I consider pypy as the only option today on a real application.
I'm not sure what this means. Pypy has limitations (it is not a 1:1 replacement for CPython), and there are legacy applications which cannot transition to pypy easily, or which are heavily dependent on numpy. For such applications, a numpy + numba implementation is useful. Having such an implementation does not preclude having a pure Python implementation which supports Pypy. They can exist along side each other.
I do not want to use numpy because on a streaming application it will never allow CPython to close the gap with pypy, an numpy just breaks pypy's jit. However I am interested in numba support,
A numba-only implementation will not perform, because numba does not support fast mode with Python arrays. The only way to get performance on this algorithm with numba is via numpy arrays.
I adapted the distogram bench to streamhist to compare the update function. On CPython distogram is 25% faster, and on pypy distogram is 13 times faster.
I measured my pure Python implementation (no numa or numpy) vs. distogram. It is 20% faster (and less code, but I didn't compare closely). The implementation uses a "maintain cost function array" approach just as distogram does. So distogram appears to have some room for improvement.
My numba+numpy implementation is 20x faster than streamhist (with 64 bins).