Import StatsBase into Statistics
Equivalent of https://github.com/JuliaLang/julia/pull/31395. See original discussions there. See #87 for current design discussion.
This PR is against the StatsBase branch, as the idea is to port all features we want from StatsBase to it before we clean up the rest (and possibly purge the history from features we don't want). Only then we'll be able to merge it with a clean history (including StatsBase's) into master. (I created the StatsBase branch using git merge master --allow-unrelated-histories -s ours after fetching the StatsBase history from its repo.)
Progress:
- [x] common.jl
- [x] weights.jl: skipped
wsum - [x] moments.jl: skipped
moment - [x] scalarstats.jl: skipped
mean_and_var,mean_and_std,zscore/zscore!, mergednquantilewithquantile - [x] robust.jl
- [ ] deviation.jl
- [x] cov.jl: skipped
mean_and_cov; skipscattermat? - [ ] counts.jl
- [x] ranking.jl
- [x] toeplitzsolvers.jl
- [x] rankcorr.jl
- [x] signalcorr.jl
- [x] partialcor.jl
- [x] empirical.jl
- [x] hist.jl
- [ ] pairwise.jl
- [x] reliability.jl
- [ ] misc.jl
- [x] sampling.jl
- [x] statmodels.jl: move to StatsAPI
- [x] transformations.jl
Seems reasonable to me I guess. Would prefer if history were maintained even if that includes broken commits, but I'd otherwise recommend including a reference to StatsBase.jl. I'm mostly just thinking that JuliaStats/StatsBase.jl#250 included a lot of design choices that folks might be curious about (e.g.,
varcorrection, naming, links).
The history is completely preserved in the StatsBase branch that this PR is against. I didn't include it here to make it possible to review changes.
The history is completely preserved in the StatsBase branch that this PR is against. I didn't include it here to make it possible to review changes.
Oops, my bad, that would explain why this was much easier to review. Thanks for sorting that out.
I would like to help with this effort. What does this PR need? A review? Other feedback?
My understanding is that this PR is basically the union of Statistics.jl and StatsBase. This will all turn into Statistics, which will be it's own package and not part of the Julia repo .
Things that absolutely need to live in the julia repo will be moved to an as-yet-unnamed file there. Things like mean, sum etc.
Statistics.jl is already on its own repo.
If you want to help with the port, you can grab the nl/weightedstats branch of this repo and add commits that insert more include calls in src/Statistics.jl and test/runtests.jl. On this branch, all files from StatsBase are there, but most are ignored: we need to include them and modify them as needed to pass tests. For weighted stats I also had to do a lot of cleanup, but for other files only minor changes should be needed. For example, scalarstats.jl should be easy. I had written a list of what I think should be done at https://github.com/JuliaLang/julia/pull/27152#issuecomment-390687566.
I am working on scalarstats.jl right now. If something is in Statistics.jl but could go in scalarstats.jl, should I move it out of Statistics.jl and into scalarstats.jl?
Feel free to move a few things to new files. Putting everything in src/Statistics.jl was OK when the package was small, but some reorganization is needed as it grows.
Great effort guys! I'd love to see the StatsBase functionality in the Statistics stdlib.
What about weighted sampling (https://juliastats.org/StatsBase.jl/stable/sampling/#Sampling-from-Population-1). Will this go into Statistics as well? (It also has semantic overlap with Random I guess.)