RFC: Sampler-specific AD settings instead of global AD settings
Currently, AD settings in Turing are defined on a global level and (partly) propagated to other packages in this way. This requires us to dispatch depending on the Turing specific mutable global state and e.g. doesn't allow to perform parallel sampling (with a custom implementation) for different AD backends concurrently. The problem in https://github.com/TuringLang/Turing.jl/issues/1400 and work on https://github.com/TuringLang/Turing.jl/pull/1401 got me thinking: could we make the AD settings instead a local state of the AD-compatible samplers (similar to e.g. ODE algorithms in OrdinaryDiffEq)?
The main problem I see right now would be that other packages such as Bijectors or AdvancedVI still use a global state, which would have to be changed as well. Maybe Turing and other packages could use a common interface for computing gradients etc. with all supported AD packages that is defined in some other package. Then, for instance, Turing would not have to define gradient_logp for all supported AD backends but just call the method of this interface for computing the forward and reverse pass in lines such as https://github.com/TuringLang/Turing.jl/blob/e1ab7e08b15687a81b6a0ce96b0f9792535939bb/src/core/ad.jl#L144-L147.