Decouple from Distributions.jl
Good people of DynamicPPL-land. I'm back from vacation, which also means I'm back with crazy ideas.
Here's my first one for now:
What if we add our own
logpdf, etc. in DynamicPPL?
I ask for the following reasons:
- Allows us to define specific AD-rules, e.g. we can use Enzyme for logpdf-computations without affecting the broader community while making inference much faster for certain models. This can be very useful if you want to squeeze out some additional performance from certain models that aren't fully compatible with certain AD frameworks, but you don't want to mess with other packages in the environment.
- Broadening support for RHS types. By decoupling the
logpdfandrandmethods from Distributions, it becomes much easier to support non-Distributions, e.g. MeasureTheory.jl. - We're not using the full Distributions.jl interface, nor do we need it.
Making the change would be trivial since it would just be a matter of adding
logpdf(d, x) = Distributions.logpdf(d, x)
and so on. It would also, depending on how we execute this, provide use with a way to not do all the type-piracy we do in DistributionsAD.jl (we could potentially define a logpdf-like thing there, which is then used in DynamicPPL.jl).
Thoughts?
@devmotion @yebai maybe?
It is sensible, but perhaps for DynamicPPL/Turing 2.0.
It is a very trivial change to make though without any real maintenance cost, while allowing 3rd-party developers to more easily extend how things are treated. I personally would have found this useful in some of the performance-sensitive projects we've been involved in.
Here are some functions that we should consider if this goes ahead:
-
logpdf -
rand -
flatten<--vectorise -
unflatten<--reconstruct
We could have generic fallbacks for all distributions from Distributions.jl, then allow users to overload these functions for their own distributions.
EDIT: optional API that can be very helpful
-
condition(d::DynamicPPLDistribution, x::NamedTuple) - gradient rules, including
rruleandfrule.
I'm on vacation, hence only some quick comments:
Allows us to define specific AD-rules, e.g. we can use Enzyme for logpdf-computations without affecting the broader community
IMO this is a major disadvantage: why should we not let the whole community benefit from AD improvements? IMO we should just add AD rules for logpdf (that are an improvement over AD) to Distributions.
Broadening support for RHS types. By decoupling the logpdf and rand methods from Distributions, it becomes much easier to support non-Distributions, e.g. MeasureTheory.jl.
Isn't this what interfaces such as DensityInterface are designed for, which are AFAIK supported by both Distributions and MeasureTheory?
IMO this is a major disadvantage: why should we not let the whole community benefit from AD improvements? IMO we should just add AD rules for logpdf (that are an improvement over AD) to Distributions.
I most certainly agree with this! But sometimes we might want to make more "opinionated" changes that are specifically beneficial to Turing. For example, would you be willing to introduce an rrule for Distributions.logpdf that uses Enzyme to implement it? To me that seems a bit "too" opinionated to make it's way into the entire ecosystem, but would most certainly benefit 95% of Turing.jl-users.
Isn't this what interfaces such as DensityInterface are designed for, which are AFAIK supported by both Distributions and MeasureTheory?
Aaah that might be true! Had completely forgotten about DensityInterface; my bad.
For example, would you be willing to introduce an rrule for Distributions.logpdf that uses Enzyme to implement it?
I think in general that should neither be done in Distributions nor in DynamicPPL. According to my understanding, rrules shouldn't call out to completely different AD systems. Or at least only to those that are as well-established, stable and fundamental such as ForwardDiff.
I very much agree with you, but the reality is that AD performance is a huge bottleneck for us. Being able to do some specialized stuff here and there that is "not great in general" could be quite useful to squeeze out more perf.