bayeslite icon indicating copy to clipboard operation
bayeslite copied to clipboard

Implement clone for generators

Open fsaad opened this issue 9 years ago • 4 comments

We require some way to clone a generator (what in Python would be copy.deepcopy(generator)). An API method (without BQL surface syntax) would be good enough until we decide whether this feature is actually desirable. The issue is blocking for the iap class lab on Thursday. Alternatively we are going to have to carry around roughly 1000 bdb files which is quite nightmarish.

fsaad avatar Jan 10 '16 22:01 fsaad

Date: Sun, 10 Jan 2016 14:15:49 -0800 From: F Saad [email protected]

We require some way to clone a generator (what in Python would be copy.deepcopy(generator). An API method (without BQL surface syntax) would be good enough. The issue is blocking for the iap class lab on Thursday. Alternatively we are going to have to carry around roughly 1000 bdb files which is quite nightmarish.

Explain why?

riastradh-probcomp avatar Jan 11 '16 00:01 riastradh-probcomp

We are interested in running BQL queries interleaved with analysis, in other words

  • Analyze generator G for 10 iters
  • r1 <- BQL query on G
  • Analyze generator G for 10 iters
  • r2 <- BQL query on G
  • ...

and then store all the "intermediate" generators (ie generator with 10 iters, 20 iters, 30 iters, ...) in a bdb. The reason we need the "intermediate" generator is largely performance based, after obtaining them we can decide what values we are going to monitor the evolution of (such as predictive probability on a test set, or simulation quality, etc).

It is important that r2 is querying a generator that has 10 additional analysis steps from the generator used by r1, as oppose to an independent generator analyzed for 20 iterations.

fsaad avatar Jan 11 '16 00:01 fsaad

So the purpose is to retain historical models so that you can see how answers changed over time?

riastradh-probcomp avatar Jan 11 '16 17:01 riastradh-probcomp

I feel as though to implement this properly we ought to just store extra bayesdb_crosscat_theta records, by adding a column to it indicating the number of iterations. Normally you would use the most recent theta for each model, but you could also choose older ones. And normally analysis would discard old ones, but we could teach it to save old ones and append new ones.

riastradh-probcomp avatar Jan 11 '16 18:01 riastradh-probcomp