Distributions.jl icon indicating copy to clipboard operation
Distributions.jl copied to clipboard

Add Wrapped distribution wrapper

Open sethaxen opened this issue 2 years ago • 6 comments

Adds a Wrapped distribution that wraps an original distribution around some interval. Optionally a parameter k is used to indicate that it should be multiply-wrapped, i.e. that the resulting wrapped distribution should have a k-fold periodicity in the interval.

Fixes #1716 and #1715

sethaxen avatar May 18 '23 21:05 sethaxen

This version of Wrapped allows for any range and periodicity to be specified as fields, but it comes with the trade-off that for distributions like Wrapped Cauchy, where we have algorithms to fit it well, we're not able to because the usual fit(::Type{<:Distribution}, x) interface doesn't allow specifying the upper and lower bounds or k. It would be nice if there was an alternate interface for wrapper distributions like these.

sethaxen avatar May 18 '23 21:05 sethaxen

Codecov Report

Patch coverage has no change and project coverage change: -5.14 :warning:

Comparison is base (ef42afb) 85.89% compared to head (3da8e84) 80.76%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1724      +/-   ##
==========================================
- Coverage   85.89%   80.76%   -5.14%     
==========================================
  Files         139      144       +5     
  Lines        8389     7122    -1267     
==========================================
- Hits         7206     5752    -1454     
- Misses       1183     1370     +187     
Impacted Files Coverage Δ
src/Distributions.jl 100.00% <ø> (ø)
src/wrapped.jl 0.00% <0.00%> (ø)
src/wrapped/cauchy.jl 0.00% <0.00%> (ø)
src/wrapped/exponential.jl 0.00% <0.00%> (ø)
src/wrapped/normal.jl 0.00% <0.00%> (ø)

... and 131 files with indirect coverage changes

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov-commenter avatar May 18 '23 21:05 codecov-commenter

Also #1665 adds a WrappedCauchy distribution. This implementation is a little more general because it also allows for k-wrapping and allows other intervals than those of length 2pi to be used.

sethaxen avatar May 20 '23 10:05 sethaxen

Before I spend more time polishing this and adding a test suite, a few questions:

  • Is the generalization from intervals of length 2π to arbitrary intervals welcome?
  • Is the generalization to build distributions with k-fold symmetry welcome?
  • For the Wrapped Normal, Wrapped Cauchy, and Wrapped Exponential, is this approach of overloading the functions for Wrapped preferred, or is it preferred to defined special named distributions as in #1665?
  • How to define fit methods? (see https://github.com/JuliaStats/Distributions.jl/pull/1724#issuecomment-1553668551)

@devmotion @ararslan

sethaxen avatar May 21 '23 13:05 sethaxen

I should preface this by saying that I know almost nothing about wrapped distributions, so any opinions expressed here are not strongly held and may be uninformed.

Is the generalization from intervals of length 2π to arbitrary intervals welcome?

I don't see why not. ¯\_(ツ)_/¯ I assume that's something that's not uncommon in practice and/or would be difficult to achieve with a simple data transformation or similar?

Is the generalization to build distributions with k-fold symmetry welcome?

I don't see why not. ¯\_(ツ)_/¯ Same general question though.

For the Wrapped Normal, Wrapped Cauchy, and Wrapped Exponential, is this approach of overloading the functions for Wrapped preferred, or is it preferred to defined special named distributions as in #1665?

Defining methods for the Wrapped wrapper has prior art with e.g. Truncated. In fact, there used to be a TruncatedNormal that was deprecated in favor of Truncated{Normal}. So I think your approach here is more consistent and extensible.

How to define fit methods?

Not all distributions have fit (or rather, fit_mle) methods, notably including Truncated-wrapped distributions, so I personally think it would be okay to punt on a decision for now and add it at a later time after some more extended design discussion.

This version of Wrapped allows for any range and periodicity to be specified as fields, but it comes with the trade-off that for distributions like Wrapped Cauchy, where we have algorithms to fit it well, we're not able to because the usual fit(::Type{<:Distribution}, x) interface doesn't allow specifying the upper and lower bounds or k.

Could k be made a type parameter, similar to the dimensionality parameter for Array? Then the fit method could be something like fit(Wrapped{Cauchy,1}, x). That doesn't solve the bounds though.

ararslan avatar May 25 '23 17:05 ararslan

Is the generalization from intervals of length 2π to arbitrary intervals welcome?

I don't see why not. ¯_(ツ)_/¯ I assume that's something that's not uncommon in practice and/or would be difficult to achieve with a simple data transformation or similar?

It's certainly useful. e.g. one might want to use [0, 2π) or [-π, π), or one might want to use degrees or days of the year. To support discrete distributions like wrapped Poisson it becomes necessary.

Is the generalization to build distributions with k-fold symmetry welcome?

I don't see why not. ¯_(ツ)_/¯ Same general question though.

I did some searching, and the only mention I can find of k-times wrapping is in Directional Statistics by Mardia and Jupp, where they give no references for papers that use it. I suspect the most likely version to use besides k=1 is k=2, which turns any circular distribution into an axial one.

Could k be made a type parameter, similar to the dimensionality parameter for Array? Then the fit method could be something like fit(Wrapped{Cauchy,1}, x). That doesn't solve the bounds though.

Yes I think this is a reasonable solution. For bounds, I propose the fallback wrapped(d::ContinuousDistribution) = wrapped(d, -π, π) (taking care to not promote types unnecessarily; too bad there's no irrational). Then if one has data with a different period than 2π, they can scale it before fitting. Not perfect, but works.

If we want Wrapped{<:Cauchy} and Wrapped{<:Normal} to be like VonMises, where the support is [μ-π, μ+π), we can change the default e.g. wrapped(d::Normal) = wrapped(d, d.μ-π, d.μ+π)

sethaxen avatar May 27 '23 13:05 sethaxen