Juho Timonen

Results 10 comments of Juho Timonen

> The fix for B) is simple if we decide to add the amount of time we spend in sampler.init_stepsize() as part of warmup This would make sense, because for...

> If this stage is taking millions of iterations there there’s a problem. It's not taking millions of those stepsize refinement iterations, just millions of evaluations of the ODE function...

> That's why I've never used those internal timings. The timing I always care about is in my app, which I can time myself externally. Given this, I'd be OK...

> Further, we encountered a relatively huge consumption of additional time besides warmup and sampling by rstan’s sampling() function. In fact, the additional time was about twice the time for...

> That's why I've never used those internal timings. The timing I always care about is in my app, which I can time myself externally. Given this, I'd be OK...

> We will just remove that if we decide to remove it from the services and go back to reporting total times per chain as it was originally, that is...

> But what do you mean by “substantial” here? The state and sampler initialization require maybe hundreds of gradient evaluations for particularly difficult problems, but then that cost should be...

Oh and in that case I was able to reduce the time required by `init_stepsize()` from that 10%-25% to very minimal amount by setting `step_size=0.1` instead of the default `step_size=1`.

Yea, so it seems that the current code assumes that if computing log probability succeeds then also computing the gradient will succeed. With ODEs however, it is possible that log_prob...