rvlib
rvlib copied to clipboard
Poor performance in Jitted loops
@danielcsaba @spencerlyon2
I'm getting poor performance here in exactly the kind of scenario where I would like to use RVlib:
https://gist.github.com/anonymous/c3387ab7408609a49472c3020517bbd6
Any ideas on what's happening?
Hi @jstac,
I had a quick look and it's failing in nopython
mode---seemingly due to a type clash. My first attempts at fixing it were not successful. I'll try and look into it in more detail.
I don't have the full sulution right away but there are two comments that can be made:
- nopython mode doesn't work because the class is defined out of the jitted loop. Moving its definition inside the function (or passing it as argument solves this problem. I don't know whether it is a long run limitation of numba.
-
f.rand(1)
returns a newly allocated 1d array, so that one should dof.rand(1)[0]
to get the associated scalar. In terms of API, one might expectf.rand()
to return a scalar ? The following code takes 2s, vs 0.5s for the other options. I would hazard the remaining gap might come from the aforementioned array creation and maybe some overhead associated to objects:
@jit(nopython=True)
def ar1_sample_mean(N, alpha, beta, s):
f = Normal(0, 1)
x = beta / (1 - alpha)
sm = 0.0
for i in range(N):
x = beta + alpha * x + s * f.rand(1)[0]
sm += x
return sm / N
@albop Thanks. I tried and got the same result. That's quite reassuring.... But still not as quick as with np.random.randn()
, as you say.
It's not great to have this fragility, but perhaps OK if we give good documentation and examples.
I think f.rand()
should return a scalar, and I wonder if that wouldn't make a difference (i.e., improve the speed a bit more)
hi @jstac and @albop
thanks Pablo for pointing out the issue.
regarding the f.rand()
, our problem was that jitclass
was not able to handle optional and default arguments so we had to pass the number as an argument no matter its value.
I reckon the np.random.randn
is the best integrated with numba
. The jitclass
won't give the same performance but still makes it possible to pass a wider variety of distributions in nopython
mode.