Turing.jl Easy way to get gradient evaluation timing

The performance tips page doesn't mention how to get timing on the gradient evaluation timing of a model. CmdStan prints this before running the sampler, and it is a helpful number for distinguishing code performance issue from parametrisation issues. Could a tip be added for this?

Oct 25 '21 20:10 maedoc

More generally I think it'd be nice if we had a more fine-tuned ability to talk to gradient evaluations. For example, I had a use case where I wanted to examine the gradients as they rolled along, but I ended up writing a custom fork of Turing to do so.

Oct 27 '21 15:10 cpfiffer

I guess the main issue is that you have many more choices in Turing/Julia to compute gradients than in Stan (and of course some samplers don't use gradients at all). My hope is that AD backends start to adapt AbstractDifferentiation which would allow us to use one common API for all differentiation backends and to support every backend that implements it automatically (well, in theory at least since the backends still have to support to differentiate the models and eg support Distributions - but this would have to be fixed in other packages and not Turing).

More practically, I think it would be better to document how to benchmark gradient computations with different backends and possibly provide some convenience functions (again hopefully this will become easier with AbstractDifferentiation). I think we shouldn't perform any benchmarks at the beginning of sampling since valid timings would require at least two (and ideally more) execution to ensure that we don't report compilation time as well.

I guess for gradient tracking we would have to report it in the transition and then users could use a callback (possibly a helper would be useful).

Oct 27 '21 15:10 devmotion

While I mentioned gradients, I think some feedback about how long model evaluations are taking would be helpful for any method, especially in Julia where wants to check that some optimisation technique has taken effect. This is less an issue with Stan, where performance is more predictable for non-experts. For context, this discourse thread is where this issue/request came from.

think we shouldn't perform any benchmarks at the beginning of sampling since valid timings would require at least two (and ideally more) execution to ensure that we don't report compilation time as well.

doesn't one need to compile anyway? a few extra evaluations aren't much to provide the information to a user; it's technically overhead, but in the same way a progress bar is.

Oct 29 '21 11:10 maedoc

doesn't one need to compile anyway?

Not in the sense that Stan does it. Turing is all JIT compiled, so the initial timings would be very poor anyways.

a few extra evaluations aren't much to provide the information to a user; it's technically overhead, but in the same way a progress bar is.

A different way to think about this is to calculate a running estimate of gradient/joint timings, and then report them when they stop moving around -- i.e. after maybe 100 evaluations it could report an accurate time without actually incurring extra evaluations.

Oct 29 '21 15:10 cpfiffer

For just benchmarking the AD-performance, this used to work: https://gist.github.com/torfjelde/7794c384d82d03c36625cd25b702b8d7

Probably still works.

IMO we should at least show an estimate of "seconds per iteration" (yes, some samplers have varying seconds/iteration but this is still useful information).

Oct 29 '21 18:10 torfjelde

IMO we should at least show an estimate of "seconds per iteration" (yes, some samplers have varying seconds/iteration but this is still useful information).

Agreed. I suppose we'd throw this into the default logger, no?

Oct 30 '21 02:10 cpfiffer

See https://github.com/TuringLang/DynamicPPL.jl/pull/346 and https://github.com/torfjelde/TuringBenchmarking.jl

Nov 12 '22 20:11 yebai