bayeslite
bayeslite copied to clipboard
Elevate "generators" and "populations" to the same level in BQL
Consider
CREATE POPULATION p FOR t(
a NUMERICAL,
b NUMERICAL,
);
CREATE GENERATOR g0 FOR p USING cgpm(
);
CREATE GENERATOR g1 FOR p USING cgpm(
a LOGNORMAL
LATENT c CATEGORICAL,
MODEL b, c GIVEN a USING baz;
);
Currently the three BQL queries can be written:
ESTIMATE PROBABILITY OF a=2 FROM p;
ESTIMATE PROBABILITY OF a=2 FROM p MODELED BY g0;
ESTIMATE PROBABILITY OF a=2 FROM p MODELED BY g1;
Conceptually all of g0
, g1
and p
are the same "type" of probabilistic object. Because p
averages over g0
and g1
for all BQL queries it implements the GPM interface. This aggregation is irrespective of the fact that g1
contains additional model for a LATENT c
. Therefore, suggest the following simplification.
ESTIMATE PROBABILITY OF a=2 FROM p;
ESTIMATE PROBABILITY OF a=2 FROM g0;
ESTIMATE PROBABILITY OF a=2 FROM g1;
My prediction is that the majority of MML programs we will be writing for now are not going to be aggregating over different metamodels for the same population, so having the additional MODELED BY
clauses is redundant.