Distributions.jl icon indicating copy to clipboard operation
Distributions.jl copied to clipboard

Document that VonMises standard deviation and variance are circular

Open itsdfish opened this issue 2 years ago • 7 comments

Hi,

I noticed that the VonMises standard deviation is wrong. Consider the following:

using Distributions
using Random 

Random.seed!(65)

dist = VonMises(0.0, 2.0)
x = rand(dist, 100_000)
@show std(x)
@show std(dist)

Output

std(x) = 0.8741380214035047
std(dist) = 0.5497502542391335

Version

Distributions v0.25.86
Julia 1.9.0 rc2

Update:

I'm not familiar with variance vs. circular variance. Could this be the reason for the discrepancy? If so, it might be good to make a note in the documentation.

itsdfish avatar Apr 01 '23 15:04 itsdfish

Could this be the reason for the discrepancy?

Yes. That is the difference. See https://en.wikipedia.org/wiki/Directional_statistics.

julia> sqrt(1 - abs(mean(exp.(complex.(0.0, x)))))
0.5492460106924333

Indeed, this could be better documented, so let's keep this issue open to track that. I've adjusted the title accordingly.

andreasnoack avatar Apr 02 '23 15:04 andreasnoack

What's the decision, if any?

Currently, var is assumed to be circular:

julia> using Distribustions

julia> d = VonMises(0, 2)

julia> var(d)
0.30222534203599194

julia> var(rand(d, 10^6))
0.7641079384821371

julia> using DirectionalStatistics

julia> Circular.var(rand(d, 10^6))
0.3036343981785453

while std is probably just incorrect:

julia> std(d)
0.5497502542391335

julia> std(rand(d, 10^6))
0.8744747812547986

julia> Circular.std(rand(d, 10^6))
0.8485527593646777

The path forward seems either:

  • Keep var as-is, fix std, update docs to refer to circular statistics
  • Convert both var and std to regular (non-circular), keep docs as-is

Technically, the latter seems more consistent with other distributions... It can potentially be confusing if std(d) != std(rand(d, N)). But are there actual scenarios when non-circular std/var of VonMises are needed?

aplavin avatar Sep 02 '23 19:09 aplavin

  • Keep var as-is, fix std, update docs to refer to circular statistics

I believe this is the intention; I'd be happy to review a PR on this if mentioned.

ParadaCarleton avatar Sep 03 '23 16:09 ParadaCarleton

I can submit a PR to make a note about circular statistics. I'm not sure where to update the docs, but I identified at least three places where the docs could be updated. @ParadaCarleton, do you prefer any or all of these? Please advise.

https://github.com/JuliaStats/Distributions.jl/blob/2cd6c678a1f237462d71496c39178dab1e265c61/src/univariates.jl#L175-L179

https://github.com/JuliaStats/Distributions.jl/blob/2cd6c678a1f237462d71496c39178dab1e265c61/src/univariate/continuous/vonmises.jl#L58

https://github.com/JuliaStats/Distributions.jl/blob/2cd6c678a1f237462d71496c39178dab1e265c61/src/univariates.jl#L175-L179

itsdfish avatar Sep 04 '23 21:09 itsdfish

Sorry. I misunderstood that std has an error. I assumed std(d) = sqrt(var(d)).

itsdfish avatar Sep 04 '23 22:09 itsdfish

Technically, the latter seems more consistent with other distributions... It can potentially be confusing if std(d) != std(rand(d, N)). But are there actual scenarios when non-circular std/var of VonMises are needed?

I find using circular variance for for circular distributions confusing. Non-circular std/var is reasonable to use when the distribution is (approximately) supported on an arc.

Von Mises Fisher distribution can be defined on any-dimensional sphere and has a valid, general notion of variance: http://www.math.ualberta.ca/~thillen/paper/vonmises.pdf . Were it implemented in Distributions.jl, should it use circular or generic definition when the sphere happens to be a circle?

mateuszbaran avatar Sep 29 '23 11:09 mateuszbaran

For convenience, I added Circular.var() etc overloads for this distribution to DirectionalStatistics.jl – to return circular variance and std. Now, we have there:

Circular.mean(d) == Circular.mean(rand(d, N))
Circular.var(d) == Circular.var(rand(d, N))
Circular.std(d) == Circular.std(rand(d, N))

These relations are very nice to have for consistency! So, maybe regular Statistics.std(VonMises) and var() should return regular std and var values, not circular ones? Then they'll also enjoy the same consistency. Or throw an error if not useful/not possible analytically.

aplavin avatar Dec 29 '23 00:12 aplavin