code accessing numpy array elements slower on cinder (jit on or off)
The example below is fairly straightforward, and gets about 20x speedup under numba. If I understand correctly, numba achieves that by accessing the numpy C API directly from jitted code, so the Python VM isn't involved. (I don't expect cinder jit to do that.)
Still, cinder (no jit) is 50% slower on this code than stock CPython (not using numba). And with cinder jit enabled on this function exclusively, there is no improvement over the no-jit case. (I confirmed that the function is compiled.)
What makes cinder slower than stock Python when touching numpy arrays?
def masked_mean(array: numpy.ndarray):
n_rows, n_cols = array.shape
mean = [0.0] * n_cols
for j in range(n_cols):
sum_ = 0.0
n = 0
for i in range(n_rows):
val = array[i, j]
if val > 0.0:
sum_ += val
n += 1
mean[j] = sum_ / n if n > 0 else -1.0
return mean
Good question. I don't know off the top of my head, but if you're interested could you try running with the Linux perf tool? With such a big discrepancy I'd hope the root-cause would be fairly obvious when comparing a perf report from Cinder and stock CPython.