Distributions.jl
Distributions.jl copied to clipboard
Document that VonMises standard deviation and variance are circular
Hi,
I noticed that the VonMises standard deviation is wrong. Consider the following:
using Distributions
using Random
Random.seed!(65)
dist = VonMises(0.0, 2.0)
x = rand(dist, 100_000)
@show std(x)
@show std(dist)
Output
std(x) = 0.8741380214035047
std(dist) = 0.5497502542391335
Version
Distributions v0.25.86
Julia 1.9.0 rc2
Update:
I'm not familiar with variance vs. circular variance. Could this be the reason for the discrepancy? If so, it might be good to make a note in the documentation.
Could this be the reason for the discrepancy?
Yes. That is the difference. See https://en.wikipedia.org/wiki/Directional_statistics.
julia> sqrt(1 - abs(mean(exp.(complex.(0.0, x)))))
0.5492460106924333
Indeed, this could be better documented, so let's keep this issue open to track that. I've adjusted the title accordingly.
What's the decision, if any?
Currently, var is assumed to be circular:
julia> using Distribustions
julia> d = VonMises(0, 2)
julia> var(d)
0.30222534203599194
julia> var(rand(d, 10^6))
0.7641079384821371
julia> using DirectionalStatistics
julia> Circular.var(rand(d, 10^6))
0.3036343981785453
while std is probably just incorrect:
julia> std(d)
0.5497502542391335
julia> std(rand(d, 10^6))
0.8744747812547986
julia> Circular.std(rand(d, 10^6))
0.8485527593646777
The path forward seems either:
- Keep var as-is, fix std, update docs to refer to circular statistics
- Convert both var and std to regular (non-circular), keep docs as-is
Technically, the latter seems more consistent with other distributions... It can potentially be confusing if std(d) != std(rand(d, N)).
But are there actual scenarios when non-circular std/var of VonMises are needed?
- Keep var as-is, fix std, update docs to refer to circular statistics
I believe this is the intention; I'd be happy to review a PR on this if mentioned.
I can submit a PR to make a note about circular statistics. I'm not sure where to update the docs, but I identified at least three places where the docs could be updated. @ParadaCarleton, do you prefer any or all of these? Please advise.
https://github.com/JuliaStats/Distributions.jl/blob/2cd6c678a1f237462d71496c39178dab1e265c61/src/univariates.jl#L175-L179
https://github.com/JuliaStats/Distributions.jl/blob/2cd6c678a1f237462d71496c39178dab1e265c61/src/univariate/continuous/vonmises.jl#L58
https://github.com/JuliaStats/Distributions.jl/blob/2cd6c678a1f237462d71496c39178dab1e265c61/src/univariates.jl#L175-L179
Sorry. I misunderstood that std has an error. I assumed std(d) = sqrt(var(d)).
Technically, the latter seems more consistent with other distributions... It can potentially be confusing if
std(d) != std(rand(d, N)). But are there actual scenarios when non-circular std/var of VonMises are needed?
I find using circular variance for for circular distributions confusing. Non-circular std/var is reasonable to use when the distribution is (approximately) supported on an arc.
Von Mises Fisher distribution can be defined on any-dimensional sphere and has a valid, general notion of variance: http://www.math.ualberta.ca/~thillen/paper/vonmises.pdf . Were it implemented in Distributions.jl, should it use circular or generic definition when the sphere happens to be a circle?
For convenience, I added Circular.var() etc overloads for this distribution to DirectionalStatistics.jl – to return circular variance and std. Now, we have there:
Circular.mean(d) == Circular.mean(rand(d, N))
Circular.var(d) == Circular.var(rand(d, N))
Circular.std(d) == Circular.std(rand(d, N))
These relations are very nice to have for consistency!
So, maybe regular Statistics.std(VonMises) and var() should return regular std and var values, not circular ones? Then they'll also enjoy the same consistency.
Or throw an error if not useful/not possible analytically.