Statistics.jl icon indicating copy to clipboard operation
Statistics.jl copied to clipboard

Import StatsBase into Statistics

Open nalimilan opened this issue 6 years ago • 7 comments

Equivalent of https://github.com/JuliaLang/julia/pull/31395. See original discussions there. See #87 for current design discussion.

This PR is against the StatsBase branch, as the idea is to port all features we want from StatsBase to it before we clean up the rest (and possibly purge the history from features we don't want). Only then we'll be able to merge it with a clean history (including StatsBase's) into master. (I created the StatsBase branch using git merge master --allow-unrelated-histories -s ours after fetching the StatsBase history from its repo.)

Progress:

  • [x] common.jl
  • [x] weights.jl: skipped wsum
  • [x] moments.jl: skipped moment
  • [x] scalarstats.jl: skipped mean_and_var, mean_and_std, zscore/zscore!, merged nquantile with quantile
  • [x] robust.jl
  • [ ] deviation.jl
  • [x] cov.jl: skipped mean_and_cov; skip scattermat?
  • [ ] counts.jl
  • [x] ranking.jl
  • [x] toeplitzsolvers.jl
  • [x] rankcorr.jl
  • [x] signalcorr.jl
  • [x] partialcor.jl
  • [x] empirical.jl
  • [x] hist.jl
  • [ ] pairwise.jl
  • [x] reliability.jl
  • [ ] misc.jl
  • [x] sampling.jl
  • [x] statmodels.jl: move to StatsAPI
  • [x] transformations.jl

nalimilan avatar Sep 28 '19 14:09 nalimilan

Seems reasonable to me I guess. Would prefer if history were maintained even if that includes broken commits, but I'd otherwise recommend including a reference to StatsBase.jl. I'm mostly just thinking that JuliaStats/StatsBase.jl#250 included a lot of design choices that folks might be curious about (e.g., varcorrection, naming, links).

The history is completely preserved in the StatsBase branch that this PR is against. I didn't include it here to make it possible to review changes.

nalimilan avatar Oct 28 '19 14:10 nalimilan

The history is completely preserved in the StatsBase branch that this PR is against. I didn't include it here to make it possible to review changes.

Oops, my bad, that would explain why this was much easier to review. Thanks for sorting that out.

rofinn avatar Oct 28 '19 16:10 rofinn

I would like to help with this effort. What does this PR need? A review? Other feedback?

My understanding is that this PR is basically the union of Statistics.jl and StatsBase. This will all turn into Statistics, which will be it's own package and not part of the Julia repo .

Things that absolutely need to live in the julia repo will be moved to an as-yet-unnamed file there. Things like mean, sum etc.

pdeffebach avatar Mar 18 '20 14:03 pdeffebach

Statistics.jl is already on its own repo.

If you want to help with the port, you can grab the nl/weightedstats branch of this repo and add commits that insert more include calls in src/Statistics.jl and test/runtests.jl. On this branch, all files from StatsBase are there, but most are ignored: we need to include them and modify them as needed to pass tests. For weighted stats I also had to do a lot of cleanup, but for other files only minor changes should be needed. For example, scalarstats.jl should be easy. I had written a list of what I think should be done at https://github.com/JuliaLang/julia/pull/27152#issuecomment-390687566.

nalimilan avatar Mar 18 '20 15:03 nalimilan

I am working on scalarstats.jl right now. If something is in Statistics.jl but could go in scalarstats.jl, should I move it out of Statistics.jl and into scalarstats.jl?

pdeffebach avatar Mar 29 '20 21:03 pdeffebach

Feel free to move a few things to new files. Putting everything in src/Statistics.jl was OK when the package was small, but some reorganization is needed as it grows.

nalimilan avatar Mar 30 '20 07:03 nalimilan

Great effort guys! I'd love to see the StatsBase functionality in the Statistics stdlib.

What about weighted sampling (https://juliastats.org/StatsBase.jl/stable/sampling/#Sampling-from-Population-1). Will this go into Statistics as well? (It also has semantic overlap with Random I guess.)

carstenbauer avatar May 22 '21 07:05 carstenbauer