Distributions.jl
Distributions.jl copied to clipboard
Add Affine distributions
Some more high-level comments:
- I think we should not introduce any additional API. and hence in particular not add
affine
: We should continue to use*
and+
(and/
and-
) to construct affine transformations of random variables/distributions. - Maybe we should use the opportunity and not just rename
LocationScale
and remove the check for the scale but actually change the design and allow general array-like variates, i.e., make it a subtype ofDistribution{ArrayLikeVariate{N},<:ValueSupport}
and allow arrays as location and scale. - I wonder if it would be useful to split
AffineDistribution
into aShiftedDistribution
and aScaledDistribution
I agree on the array-like variates bit and actually started work on that yesterday.
I agree we should encourage and use *
, +
, /
, and -
over affine
, but I think it makes sense to include affine
for ease of implementation. This way anyone overloading affine
only has to add 1 method instead of 4.
I'm not sure what benefits a split would give, and it might make type-checking and conversions more of a chore since Scaled{Shifted}
would be different from Shifted{Scaled}
. In particular we'd have to add more methods to collapse Affine
down when performing repeated affine transformations.
I agree we should encourage and use
*
,+
,/
, and-
overaffine
, but I think it makes sense to includeaffine
for ease of implementation. This way anyone overloadingaffine
only has to add 1 method instead of 4.
There are default fallbacks for /
and -
, so generally one just has to define x + distribution
and x * distribution
. I think this is simple enough for developers and not more difficult than defining affine
. Additionally, having both affine
and +
, ... seems confusing - if possible, we should not provide multiple ways but one recommended one for constructing affine transformations.
Unrelated to the previous comments, we might want some field name other than scale
, which seems to be a bit too easy to confuse with the scale
function. Greek letters work in the worst-case scenario but are annoying to type in the REPL.
@devmotion it turns out that this is already implemented in MeasureTheory.jl, and quite nicely too. Chad offered to move it to MeasuresBase, since you mentioned you'd be interested in adding MeasuresBase as a dependency. Should we maybe just add MeasuresBase as a dependency and reexport this feature?
No, this is not sufficient. We want to create and work with Distribution
s, *
etc. should not suddenly lift Distribution
s to Measure
s.
If we go with +
and -
, the docstring should mention ⊕ from #1445
We have to support +
, *
etc. anyway if we want to avoid a breaking change since it is the current API.
To me it seems an additional function would not have any advantage but only extend the API, which is possibly confusing for users, developers, and an additional source for breaking changes in the future. The main argument above for it, it seems, was that it would be easier to implement. Actually, I think this is not correct:
- As the PR for
Normal
etc. shows, one just has to implement two functions for univariate distributions currently - The more modular design makes it much easier to add specialized implementations for specific types: With
affine
one would have to handle all combinations of the types for scale and shift whereas otherwise one can implement a method for eg a specific type of scales without worrying about types of shifts.
Got it; I'll rewrite this to handle +
and *
directly.
Should we memoize inv(σ)
since there's a good chance it'll be frequently used?
Should we memoize
inv(σ)
since there's a good chance it'll be frequently used?
No, I don't think it's worth it. In my experience it just causes problems, eg with AD and undesired promotions (e.g. if sigma is an Int
, the cached inverse would be a Float64
and cause promotions in computations with Float32
that don't occur if we just divide by sigma).
Should we memoize
inv(σ)
since there's a good chance it'll be frequently used?No, I don't think it's worth it. In my experience it just causes problems, eg with AD and undesired promotions (e.g. if sigma is an
Int
, the cached inverse would be aFloat64
and cause promotions in computations withFloat32
that don't occur if we just divide by sigma).
Hmm; could we memoize in cases where base_dist
is continuous, since everything has to be converted to eltype(base_dist)
(after promoting with the scale+shift params) anyways?
No, eltype
should (and will) not match the parameters in general. If one wants to e.g. evaluate just the log density it is irrelevant what the eltype
is - one should be able to provide e.g. a point in Float32
and obtain a result that is not promoted to Float64
.
I'm very certain memoization just causes problems here and is not worth it. As an example in a different package, similar issues occurred with memoization of inverses in PDMats and hence in the latest release it was removed.
The right approach is not to memoize but just to not constrain the type of the scale so if needed one can use a wrapper for the inverse scale (similar to a factorization object) as scale object
@devmotion I think the PR is mostly done apart from --
- Tests for the multivariate case (which I'm writing)
- Whatever is wrong with the discrete CDF functions, which are failing the tests; if you know what I'm doing wrong I'm all ears.
Whatever is wrong with the discrete CDF functions, which are failing the tests; if you know what I'm doing wrong I'm all ears.
I don't know which error you face, so I can't really help here.
I think some things should be designed a bit differently. E.g., it is completely fine to translate
Distribution{<:ArrayLikeVariate}
of arbitrary dimension but a scale ofDiagonal(Ones(...))
orI
only makes sense forMultivariateDistribution
s andMatrixDistribution
s. Therefore I think it would be better to splitAffineDistribution
in two different distributions, one that shifts and one that applies linear transformations.
Maybe we could use the following design, which seems to address also your concerns about complicated dispatches and constructions of affine transformations above:
- Add a
ShiftedDistribution
/TranslatedDistribution
that is the fallback forx::Real + dist::UnivariateDistribution
andx::AbstractArray{<:Real,N} + dist::Distribution{ArrayLikeVariate{N}}
, i.e., the distribution ofx + Z
where the law ofZ
isdist
- Add a
ScaledDistribution
(not sure about the name) that is the fallback forx::Real * dist::Distribution{<:ArrayLikeVariate}
andx::AbstractArray * dist::Distribution{<:ArrayLikeVariate}
(forx
of the correct last dimension), i.e., the distribution ofx * Z
where the law ofZ
isdist
- Add an
AffineDistribution
that is the fallback forx::Real * dist::ShiftedDistribution
,x::AbstractArray * dist::ShiftedDistribution
,x::Real + dist::ScaledDistribution{Univariate}
, andx::AbstractArray{<:Real,N} + dist::ScaledDistribution{ArrayLikeVariate{N}}
Then we could support all transformations of the form a + Z
, b * Z
, a + b * Z
where a
and b
are of the correct sizes and we don't have to deal with neutral elements of addition and multiplication as in this PR. We could still dispatch on the fallback for affine transformations explicitly and there would be no problems with ShiftedDistribution
of ScaledDistribution
versus ScaledDistribution
of ShiftedDistribution
.
All tests now pass, except for support
when called on normal distributions that have ForwardDiff.Dual
numbers as arguments. I'm going to be completely honest and say I'm not sure what support
is supposed to do there.
I hope you don't take this personal but I don't think we can merge the PR in its current state. In my opinion it is difficult to review and, in particular, it is difficult to assess if there are any breaking changes. Distributions is a central package and hence we have to be careful about not breaking downstream packages (since there are so many). To me it still seems there are too many changes and too general definitions, even for the univariate case currently handled by LocationScale
(indicated by the test failures as well).
As a way forward, I think we should break up this PR (or maybe start from scratch) and
- make an initial PR that only replaces
LocationScale
withAffineDistribution
(with less restrictive type parameters), adds a deprecation warning forLocationScale
, and adds type aliases forLocationScale
- but without any further changes or generalizations e.g. to multivariate distributions: Tests should still pass and it would be easy to review if the files are not renamed; - make a separate PR that allows and tests negative scales for univariate distributions;
- make a follow-up PR that adds implementations for multi- and possibly even more general arrayvariate distributions (but without changes of the univariate implementations);
- make additional PRs that introduce
TranslatedDistribution
andScaledDistribution
to simplify and enable the handling of+
and*
, without changing the implementations forAffineDistribution
but only changing dispatches of*
and+
and combiningScaledDistribution
andTranslatedDistribution
toAffineDistribution
s.
I hope you don't take this personal but I don't think we can merge the PR in its current state. In my opinion it is difficult to review and, in particular, it is difficult to assess if there are any breaking changes. Distributions is a central package and hence we have to be careful about not breaking downstream packages (since there are so many). To me it still seems there are too many changes and too general definitions, even for the univariate case currently handled by
LocationScale
(indicated by the test failures as well).As a way forward, I think we should break up this PR (or maybe start from scratch) and
- make an initial PR that only replaces
LocationScale
withAffineDistribution
(with less restrictive type parameters), adds a deprecation warning forLocationScale
, and adds type aliases forLocationScale
- but without any further changes or generalizations e.g. to multivariate distributions: Tests should still pass and it would be easy to review if the files are not renamed;- make a separate PR that allows and tests negative scales for univariate distributions;
- make a follow-up PR that adds implementations for multi- and possibly even more general arrayvariate distributions (but without changes of the univariate implementations);
- make additional PRs that introduce
TranslatedDistribution
andScaledDistribution
to simplify and enable the handling of+
and*
, without changing the implementations forAffineDistribution
but only changing dispatches of*
and+
and combiningScaledDistribution
andTranslatedDistribution
toAffineDistribution
s.
If you'd like, I can break this into two PRs. The first can add a UnivariateAffine
which is just LocationScale
with a possibly negative scale; this would replace LocationScale
. A separate PR can add a MultivariateAffine
distribution.
I also would like to point out there have been essentially no changes to the behavior, of LocationScale
, with the exception of:
- Bug fixes for the
location
andscale
methods, which are currently returning incorrect answers. (They always returnlocationscale.μ
andlocationscale.σ
, regardless of the location and scale of the underlying distribution. For instance,LocationScale(0, 1, Normal(2, 4))
currently returns(0, 1)
.) - The implementations of new constructors and deprecation of the old
LocationScale()
-style constructor, as you requested.
All bugs were related to dealing with negative transformations of discrete distributions, where I forgot to account for the probability that x
is exactly equal to X
. There may be bugs regarding multivariate distributions that I haven't caught yet, since I was still writing the tests, but this PR should not contain any breaking changes.
If you'd like to rewrite the code for multivariate affine transformations from scratch, work out a new interface to replace the old LocationScale
interface from scratch, etc. that's completely reasonable. For now, I'll gladly make a separate PR implementing the code I had ready a week and a half ago. This code made no changes except removing the positivity checks for LocationScale
and throwing up a couple absolute value bars, before I'd rewritten all the constructors from scratch, deprecated or rewritten all of the old constructors to favor your own syntax using +
, -
, etc. instead of my original Affine
which was a simple reskin of LocationScale
, and implemented several features beyond the original scope of this PR at your request.
If you'd like, I can break this into two PRs. The first can add a UnivariateAffine which is just LocationScale with a possibly negative scale; this would replace LocationScale. A separate PR can add a MultivariateAffine distribution.
Just to be sure there's no misunderstanding: We should not have separate distributions but at most type aliases (maybe they re not needed though). As mentioned above, it would be preferable to allow negative scales only in a second PR to keep changes in the different PRs minimal and make them easier to review.
I also would like to point out there have been essentially no changes to the behavior, of LocationScale, with the exception of:
Unfortunately, while the behaviour might not have changed (I don't know) the implementation has to a larger extent than what seemed necessary. During the different rounds of preliminary review I noticed and commented on methods that were modified slightly or generalized (sometimes incorrectly). Due to the renaming and other unrelated changes it is not possible for me to check with certainty if changes are needed, are bug fixes, are necessary or unnecessary generalizations, or are just style changes.
Ok; if you'd like to implement renaming and all the different changes in a different PR, go ahead, then. I'll make a single PR that only changes LocationScale to allow negative scale values, or else (if you don't like this idea) I'll add reflections directly to a different package instead of spending more and more time making changes completely unrelated to this feature at your request.
Sure, it's seemingly simple changes and of course it could be done in fewer PRs - but Distributions has > 300 direct dependents and > 700 indirect ones and hence I won't merge any PRs, in particular no refactorings, for which I feel I can't review them properly and guarantee with reasonable confidence that they don't break anything. Therefore I suggest breaking it up into small PRs that are easy to review.