OnlineStats.jl icon indicating copy to clipboard operation
OnlineStats.jl copied to clipboard

OnlineStats for Bayesian modeling?

Open cscherrer opened this issue 6 years ago • 3 comments
trafficstars

Hi Josh,

I've been moving toward MCMC results being in the form of an iterator instead of an array, and encouraging others in this direction as well. This convenience and flexibility in a lot of different ways.

There seems to be some interest in this approach from the Turing team: https://github.com/TuringLang/AdvancedHMC.jl/issues/101#issuecomment-531494672

And Tamas Papp is also trying this out for DynamicHMC: https://github.com/tpapp/DynamicHMC.jl/pull/94

Have you done or seen anything in this direction for OnlineStats?

The general idea is to specify a stopping criterion, say a standard error on the mean estimate of some function of the posterior sample. I think it will also be nice to have a way to deal with intermediate results.

A few things are needed for this approach, most already available:

  • Mean and variance
  • Standard error of mean estimate
  • Effective sample size, for use on its own and also in standard error. Depending on the context, this is computed in terms of autocorrelations or sample weight.
  • Rank-normalized R-hat

Any thoughts on this?

cscherrer avatar Sep 16 '19 03:09 cscherrer

I haven't done anything MCMC in a while, but I think OnlineStats has all the pieces you need (means, variances, and autocorrelations).

The implementations of Mean and Variance live in OnlineStatsBase, so if you're looking to add minimal dependencies you can go that route. I should probably move AutoCov over there as well.

joshday avatar Sep 16 '19 11:09 joshday

@cscherrer if you're thinking of making a BayesianOnlineStats package I'd be happy to contribute. It'll be a good excuse to spend more time thinking about how to work with streaming samples and to learn OnlineStats.

I think BFMI as well could be supported. It only requires Mean and Variance.

sethaxen avatar Nov 23 '19 01:11 sethaxen

Nice! I haven't thought about this much in a few months, but I do think it's important. Currently the best I have is using Transducers: https://github.com/cscherrer/QuasiMonteCarlo.jl

There are really two independent concern here -- QMC and stream combinators -- but this made a nice sandbox for trying out some ideas.

I think my mental model of the current Julia approach was a bit off. Haskell has a nice "stream fusion" approach that lets you apply a sequence of transformations to a stream without a performance penalty. Transducers is a bit like this turned on its head - there, the transformations compose nicely, as long as you don't actually apply them at each step.

cscherrer avatar Nov 23 '19 16:11 cscherrer