Chase Geigle
Chase Geigle
EDIT: Using `master` branch. With `mongodb-bin` available, ```bash $ aur sync unifi --provides ==> Using [aur] repository -> mongodb: (none) -> 4.0.12-1 -> python2-scons: (none) -> 3.1.1-1 -> unifi: 5.10.25-1...
There should be a way to save intermediate model files between iterations of inference.
We should do a benchmark against existing implementations of LDA (like Mallet) for sanity.
You should be able to load a model from a stream.
Model saving should support writing to streams instead of fixed files, and should use the binary format from `io::packed`.
The SCVB0 implementation should have an interface that reflects its stochastic nature (allowing to fit to new documents in a streaming fashion) mirroring the online classifier interface.
We should add a parallel implementation of CVB0, just like we have a parallel implementation of collapsed Gibbs sampling.
We should optimize the alpha and beta values during inference (see Hannah Wallach's thesis, chapter 2).
We should revisit our alpha and beta default values. 1.0 is _way_ to large.
The Gibbs sampling inference methods should allow taking multiple samples to estimate theta and phi.