bayeslite icon indicating copy to clipboard operation
bayeslite copied to clipboard

Elevate "generators" and "populations" to the same level in BQL

Open fsaad opened this issue 8 years ago • 0 comments

Consider

CREATE POPULATION p FOR t(
    a NUMERICAL,
    b NUMERICAL,
);

CREATE GENERATOR g0 FOR p USING cgpm(
);

CREATE GENERATOR g1 FOR p USING cgpm(
    a  LOGNORMAL

    LATENT c CATEGORICAL,      
    MODEL b, c GIVEN a USING baz;
);

Currently the three BQL queries can be written:

ESTIMATE PROBABILITY OF a=2 FROM p;
ESTIMATE PROBABILITY OF a=2 FROM p MODELED BY g0;
ESTIMATE PROBABILITY OF a=2 FROM p MODELED BY g1;

Conceptually all of g0, g1 and p are the same "type" of probabilistic object. Because p averages over g0 and g1 for all BQL queries it implements the GPM interface. This aggregation is irrespective of the fact that g1 contains additional model for a LATENT c. Therefore, suggest the following simplification.

ESTIMATE PROBABILITY OF a=2 FROM p;
ESTIMATE PROBABILITY OF a=2 FROM g0;
ESTIMATE PROBABILITY OF a=2 FROM g1;

My prediction is that the majority of MML programs we will be writing for now are not going to be aggregating over different metamodels for the same population, so having the additional MODELED BY clauses is redundant.

fsaad avatar Jul 22 '16 17:07 fsaad