Distributions.jl
Distributions.jl copied to clipboard
Implement ExpGamma sampler (draft)
As discussed in #1886 , here is a branch with only the ExpGamma additions I made. This is hopefully of use to @chelate.
Potential TODOs:
- Figure out a reasonable threshold to switch between the Liu-Martin-Syring Small Shape sampler and the Maraglia-Tsang Inverse Power sampler.
- Possibly implement Algorithm 3 from Kundu and Gupta (https://home.iitk.ac.in/~kundu/paper120.pdf) for when alpha > 0.3
Codecov Report
:x: Patch coverage is 0% with 23 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 85.97%. Comparing base (05a7875) to head (d678904).
:warning: Report is 19 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/samplers/expgamma.jl | 0.00% | 23 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #1958 +/- ##
==========================================
- Coverage 86.20% 85.97% -0.23%
==========================================
Files 146 147 +1
Lines 8769 8792 +23
==========================================
Hits 7559 7559
- Misses 1210 1233 +23
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
I found an issue in ExpGammaIPSampler that is beyond my understanding. The values it generates are too high. If you undo the log on them, the output has a low bound of 1 instead of 0. I don't think it's a sign issue.
julia> rand(rng, GammaIPSampler(Gamma(0.9)), 10)
10-element Vector{Float64}:
0.2135985770580691
1.137650455835034
1.2565944033941983
0.656188185739729
0.4203566698386437
1.66687553573059
0.2053545534917769
0.41111662368149415
0.14513700772532737
0.6549826409710889
julia> exp.(rand(rng, ExpGammaIPSampler(Gamma(0.9)), 10))
10-element Vector{Float64}:
2.051378946898306
1.0193261057366838
2.0454369373226133
1.1704569694617193
1.363270090050132
3.747017831243905
2.1354060996412194
1.0657728289135096
1.173677653737439
1.0294127958057897
and
julia> rand(rng, GammaIPSampler(Gamma(0.01)), 10)
10-element Vector{Float64}:
1.836714536149393e-57
1.5105073011994874e-50
3.537147232822205e-36
1.2547571475373618e-32
5.885231757348809e-38
2.9054960529000983e-34
1.2110309289241029e-6
1.4231361276813385e-101
2.393431274684707e-27
4.809894203977362e-38
julia> exp.(rand(rng, ExpGammaIPSampler(Gamma(0.01)), 10))
10-element Vector{Float64}:
1.0207709469175534
1.0000000000000002
1.0000000000000946
1.0
1.0
1.0
1.0
1.0000000000087867
1.0
1.000000000000009
The code looks mathematically correct to me.
It's about 2x faster than ExpGammaSSSampler, but ExpGammaSSSampler returns values in the expected range while ExpGammaIPSampler is broken.
I found the issue; one of the ExpGammaIPSampler methods was actually returning a GammaIPSampler.
After correcting that, I have done some basic testing. This means I've checked for rates of non-finite values and duplicates. I have not checked past that. If something "works", it means it's generating unique finite values.
I have found that:
-
The Log Inverse Power sampler is always faster; runtime is independent of alpha.
-
The Small Shape sampler is at its fastest at alpha ~= 0.01. At this point, it still takes about 150% as much time as the Log Inverse Power sampler. At its worst, it's probably takes about 400% as much time.
-
Both samplers perform without issue until around 7e-308
-
Starting around 7e-308, the Small Shape sampler performs slightly worse (~10% more
-Infs) until around 7e-309 -
The Log Inverse Power sampler rapidly fails below 7e-309, generating almost pure
-Infs by 5e-309 -
The Small Shape sampler continues to generate unique finite values down to around 1e-314, but the rate is only about 2% by 1e-310
I'll make some charts later.
Note that I have NOT looked into how "accurate" the numbers generated have been yet. It is very much possible that the Log Inverse Power sampler is generating nonsense at some point.
Thanks for the work and documentation! I am able to look into this in July, but I will probably need lots of help because I am not much of a coder. That said, I am really really annoyed by the lack of a ExpGamma sampler and will do my best.
Personally, I don't think what happens below 1e-308 is really important, and the IPS sample seems to "dominate" to use the statistical parlance. I mean with gamma occasionally giving zeros at like 1e-3 or something, Id hardly worry about something that is slightly suboptimal 300 orders of magnitude later, right?