Distributions.jl icon indicating copy to clipboard operation
Distributions.jl copied to clipboard

Add Affine distributions

Open ParadaCarleton opened this issue 3 years ago • 25 comments

ParadaCarleton avatar Nov 28 '21 16:11 ParadaCarleton

Some more high-level comments:

  • I think we should not introduce any additional API. and hence in particular not add affine: We should continue to use * and + (and / and -) to construct affine transformations of random variables/distributions.
  • Maybe we should use the opportunity and not just rename LocationScale and remove the check for the scale but actually change the design and allow general array-like variates, i.e., make it a subtype of Distribution{ArrayLikeVariate{N},<:ValueSupport} and allow arrays as location and scale.
  • I wonder if it would be useful to split AffineDistribution into a ShiftedDistribution and a ScaledDistribution

devmotion avatar Nov 28 '21 20:11 devmotion

I agree on the array-like variates bit and actually started work on that yesterday.

I agree we should encourage and use *, +, /, and - over affine, but I think it makes sense to include affine for ease of implementation. This way anyone overloading affine only has to add 1 method instead of 4.

I'm not sure what benefits a split would give, and it might make type-checking and conversions more of a chore since Scaled{Shifted} would be different from Shifted{Scaled}. In particular we'd have to add more methods to collapse Affine down when performing repeated affine transformations.

ParadaCarleton avatar Nov 28 '21 21:11 ParadaCarleton

I agree we should encourage and use *, +, /, and - over affine, but I think it makes sense to include affine for ease of implementation. This way anyone overloading affine only has to add 1 method instead of 4.

There are default fallbacks for / and -, so generally one just has to define x + distribution and x * distribution. I think this is simple enough for developers and not more difficult than defining affine. Additionally, having both affine and +, ... seems confusing - if possible, we should not provide multiple ways but one recommended one for constructing affine transformations.

devmotion avatar Nov 28 '21 21:11 devmotion

Unrelated to the previous comments, we might want some field name other than scale, which seems to be a bit too easy to confuse with the scale function. Greek letters work in the worst-case scenario but are annoying to type in the REPL.

ParadaCarleton avatar Nov 29 '21 02:11 ParadaCarleton

@devmotion it turns out that this is already implemented in MeasureTheory.jl, and quite nicely too. Chad offered to move it to MeasuresBase, since you mentioned you'd be interested in adding MeasuresBase as a dependency. Should we maybe just add MeasuresBase as a dependency and reexport this feature?

ParadaCarleton avatar Dec 01 '21 01:12 ParadaCarleton

No, this is not sufficient. We want to create and work with Distributions, * etc. should not suddenly lift Distributions to Measures.

devmotion avatar Dec 01 '21 07:12 devmotion

If we go with + and -, the docstring should mention ⊕ from #1445

mschauer avatar Dec 01 '21 12:12 mschauer

We have to support +, * etc. anyway if we want to avoid a breaking change since it is the current API.

To me it seems an additional function would not have any advantage but only extend the API, which is possibly confusing for users, developers, and an additional source for breaking changes in the future. The main argument above for it, it seems, was that it would be easier to implement. Actually, I think this is not correct:

  • As the PR for Normal etc. shows, one just has to implement two functions for univariate distributions currently
  • The more modular design makes it much easier to add specialized implementations for specific types: With affine one would have to handle all combinations of the types for scale and shift whereas otherwise one can implement a method for eg a specific type of scales without worrying about types of shifts.

devmotion avatar Dec 01 '21 13:12 devmotion

Got it; I'll rewrite this to handle + and * directly.

ParadaCarleton avatar Dec 01 '21 18:12 ParadaCarleton

Should we memoize inv(σ) since there's a good chance it'll be frequently used?

ParadaCarleton avatar Dec 01 '21 18:12 ParadaCarleton

Should we memoize inv(σ) since there's a good chance it'll be frequently used?

No, I don't think it's worth it. In my experience it just causes problems, eg with AD and undesired promotions (e.g. if sigma is an Int, the cached inverse would be a Float64 and cause promotions in computations with Float32 that don't occur if we just divide by sigma).

devmotion avatar Dec 01 '21 18:12 devmotion

Should we memoize inv(σ) since there's a good chance it'll be frequently used?

No, I don't think it's worth it. In my experience it just causes problems, eg with AD and undesired promotions (e.g. if sigma is an Int, the cached inverse would be a Float64 and cause promotions in computations with Float32 that don't occur if we just divide by sigma).

Hmm; could we memoize in cases where base_dist is continuous, since everything has to be converted to eltype(base_dist) (after promoting with the scale+shift params) anyways?

ParadaCarleton avatar Dec 01 '21 19:12 ParadaCarleton

No, eltype should (and will) not match the parameters in general. If one wants to e.g. evaluate just the log density it is irrelevant what the eltype is - one should be able to provide e.g. a point in Float32 and obtain a result that is not promoted to Float64.

devmotion avatar Dec 01 '21 19:12 devmotion

I'm very certain memoization just causes problems here and is not worth it. As an example in a different package, similar issues occurred with memoization of inverses in PDMats and hence in the latest release it was removed.

devmotion avatar Dec 01 '21 19:12 devmotion

The right approach is not to memoize but just to not constrain the type of the scale so if needed one can use a wrapper for the inverse scale (similar to a factorization object) as scale object

mschauer avatar Dec 02 '21 09:12 mschauer

@devmotion I think the PR is mostly done apart from --

  1. Tests for the multivariate case (which I'm writing)
  2. Whatever is wrong with the discrete CDF functions, which are failing the tests; if you know what I'm doing wrong I'm all ears.

ParadaCarleton avatar Dec 05 '21 06:12 ParadaCarleton

Whatever is wrong with the discrete CDF functions, which are failing the tests; if you know what I'm doing wrong I'm all ears.

I don't know which error you face, so I can't really help here.

devmotion avatar Dec 05 '21 18:12 devmotion

I think some things should be designed a bit differently. E.g., it is completely fine to translate Distribution{<:ArrayLikeVariate} of arbitrary dimension but a scale of Diagonal(Ones(...)) or I only makes sense for MultivariateDistributions and MatrixDistributions. Therefore I think it would be better to split AffineDistribution in two different distributions, one that shifts and one that applies linear transformations.

Maybe we could use the following design, which seems to address also your concerns about complicated dispatches and constructions of affine transformations above:

  • Add a ShiftedDistribution/TranslatedDistribution that is the fallback for x::Real + dist::UnivariateDistribution and x::AbstractArray{<:Real,N} + dist::Distribution{ArrayLikeVariate{N}}, i.e., the distribution of x + Z where the law of Z is dist
  • Add a ScaledDistribution (not sure about the name) that is the fallback for x::Real * dist::Distribution{<:ArrayLikeVariate} and x::AbstractArray * dist::Distribution{<:ArrayLikeVariate} (for x of the correct last dimension), i.e., the distribution of x * Z where the law of Z is dist
  • Add an AffineDistribution that is the fallback for x::Real * dist::ShiftedDistribution, x::AbstractArray * dist::ShiftedDistribution, x::Real + dist::ScaledDistribution{Univariate}, and x::AbstractArray{<:Real,N} + dist::ScaledDistribution{ArrayLikeVariate{N}}

Then we could support all transformations of the form a + Z, b * Z, a + b * Z where a and b are of the correct sizes and we don't have to deal with neutral elements of addition and multiplication as in this PR. We could still dispatch on the fallback for affine transformations explicitly and there would be no problems with ShiftedDistribution of ScaledDistribution versus ScaledDistribution of ShiftedDistribution.

devmotion avatar Dec 05 '21 19:12 devmotion

All tests now pass, except for support when called on normal distributions that have ForwardDiff.Dual numbers as arguments. I'm going to be completely honest and say I'm not sure what support is supposed to do there.

ParadaCarleton avatar Dec 08 '21 17:12 ParadaCarleton

I hope you don't take this personal but I don't think we can merge the PR in its current state. In my opinion it is difficult to review and, in particular, it is difficult to assess if there are any breaking changes. Distributions is a central package and hence we have to be careful about not breaking downstream packages (since there are so many). To me it still seems there are too many changes and too general definitions, even for the univariate case currently handled by LocationScale (indicated by the test failures as well).

As a way forward, I think we should break up this PR (or maybe start from scratch) and

  1. make an initial PR that only replaces LocationScale with AffineDistribution (with less restrictive type parameters), adds a deprecation warning for LocationScale, and adds type aliases for LocationScale - but without any further changes or generalizations e.g. to multivariate distributions: Tests should still pass and it would be easy to review if the files are not renamed;
  2. make a separate PR that allows and tests negative scales for univariate distributions;
  3. make a follow-up PR that adds implementations for multi- and possibly even more general arrayvariate distributions (but without changes of the univariate implementations);
  4. make additional PRs that introduce TranslatedDistribution and ScaledDistribution to simplify and enable the handling of + and *, without changing the implementations for AffineDistribution but only changing dispatches of * and + and combining ScaledDistribution and TranslatedDistribution to AffineDistributions.

devmotion avatar Dec 08 '21 19:12 devmotion

I hope you don't take this personal but I don't think we can merge the PR in its current state. In my opinion it is difficult to review and, in particular, it is difficult to assess if there are any breaking changes. Distributions is a central package and hence we have to be careful about not breaking downstream packages (since there are so many). To me it still seems there are too many changes and too general definitions, even for the univariate case currently handled by LocationScale (indicated by the test failures as well).

As a way forward, I think we should break up this PR (or maybe start from scratch) and

  1. make an initial PR that only replaces LocationScale with AffineDistribution (with less restrictive type parameters), adds a deprecation warning for LocationScale, and adds type aliases for LocationScale - but without any further changes or generalizations e.g. to multivariate distributions: Tests should still pass and it would be easy to review if the files are not renamed;
  2. make a separate PR that allows and tests negative scales for univariate distributions;
  3. make a follow-up PR that adds implementations for multi- and possibly even more general arrayvariate distributions (but without changes of the univariate implementations);
  4. make additional PRs that introduce TranslatedDistribution and ScaledDistribution to simplify and enable the handling of + and *, without changing the implementations for AffineDistribution but only changing dispatches of * and + and combining ScaledDistribution and TranslatedDistribution to AffineDistributions.

If you'd like, I can break this into two PRs. The first can add a UnivariateAffine which is just LocationScale with a possibly negative scale; this would replace LocationScale. A separate PR can add a MultivariateAffine distribution.

ParadaCarleton avatar Dec 08 '21 19:12 ParadaCarleton

I also would like to point out there have been essentially no changes to the behavior, of LocationScale, with the exception of:

  1. Bug fixes for the location and scale methods, which are currently returning incorrect answers. (They always return locationscale.μ and locationscale.σ, regardless of the location and scale of the underlying distribution. For instance, LocationScale(0, 1, Normal(2, 4)) currently returns (0, 1).)
  2. The implementations of new constructors and deprecation of the old LocationScale()-style constructor, as you requested.

All bugs were related to dealing with negative transformations of discrete distributions, where I forgot to account for the probability that x is exactly equal to X. There may be bugs regarding multivariate distributions that I haven't caught yet, since I was still writing the tests, but this PR should not contain any breaking changes.

If you'd like to rewrite the code for multivariate affine transformations from scratch, work out a new interface to replace the old LocationScale interface from scratch, etc. that's completely reasonable. For now, I'll gladly make a separate PR implementing the code I had ready a week and a half ago. This code made no changes except removing the positivity checks for LocationScale and throwing up a couple absolute value bars, before I'd rewritten all the constructors from scratch, deprecated or rewritten all of the old constructors to favor your own syntax using +, -, etc. instead of my original Affine which was a simple reskin of LocationScale, and implemented several features beyond the original scope of this PR at your request.

ParadaCarleton avatar Dec 08 '21 23:12 ParadaCarleton

If you'd like, I can break this into two PRs. The first can add a UnivariateAffine which is just LocationScale with a possibly negative scale; this would replace LocationScale. A separate PR can add a MultivariateAffine distribution.

Just to be sure there's no misunderstanding: We should not have separate distributions but at most type aliases (maybe they re not needed though). As mentioned above, it would be preferable to allow negative scales only in a second PR to keep changes in the different PRs minimal and make them easier to review.

I also would like to point out there have been essentially no changes to the behavior, of LocationScale, with the exception of:

Unfortunately, while the behaviour might not have changed (I don't know) the implementation has to a larger extent than what seemed necessary. During the different rounds of preliminary review I noticed and commented on methods that were modified slightly or generalized (sometimes incorrectly). Due to the renaming and other unrelated changes it is not possible for me to check with certainty if changes are needed, are bug fixes, are necessary or unnecessary generalizations, or are just style changes.

devmotion avatar Dec 08 '21 23:12 devmotion

Ok; if you'd like to implement renaming and all the different changes in a different PR, go ahead, then. I'll make a single PR that only changes LocationScale to allow negative scale values, or else (if you don't like this idea) I'll add reflections directly to a different package instead of spending more and more time making changes completely unrelated to this feature at your request.

ParadaCarleton avatar Dec 08 '21 23:12 ParadaCarleton

Sure, it's seemingly simple changes and of course it could be done in fewer PRs - but Distributions has > 300 direct dependents and > 700 indirect ones and hence I won't merge any PRs, in particular no refactorings, for which I feel I can't review them properly and guarantee with reasonable confidence that they don't break anything. Therefore I suggest breaking it up into small PRs that are easy to review.

devmotion avatar Dec 08 '21 23:12 devmotion