bayeslite icon indicating copy to clipboard operation
bayeslite copied to clipboard

Implement INSERT INTO for metamodel (i.e. incorporating rows)

Open fsaad opened this issue 8 years ago • 1 comments

On June 28, a partial implementation of INSERT INTO was (rightfully) nixed: 5bf3d2d1ec3f71518f8ad27e6100feb3419c73e5

The architecture now with population p and table t suggests the following workflow.

CREATE TABLE t FROM 'foo.csv' -- create table from .csv file
CREATE POPULATION p FOR t WITH SCHEMA ( GUESS (*) ); -- create population for `t`.
CREATE METAMODEL m FOR p; -- create metamdoel
INITIALIZE 10 MODELS FOR m; -- all existing data in t incorporated into all 10 gpms
ANALYZE m FOR 10 ITERATION; -- analysis

INSERT INTO t SELECT * FROM q -- user inserts more rows from q into the base table.
-- At this point, the metamodel `m` has no idea of the new rows. We have used
-- a workflow up to this point for performing analysis tasks where `q` contains the held-out
-- dataset.

 -- New query informing all metamodels of `p` to rescan `t` and invoke gpm.incorporate
-- on the new rows. Actual wording can be changed.
ALTER POPULATION p RETRIEVE ROWS FROM BASE TABLE;

fsaad avatar Nov 05 '16 14:11 fsaad

Looks reasonable. As for the specific wording, we could do something like

ALTER POPULATION p RESAMPLE[(n)] [FROM t]

which would leave room for (a) changing a subsample to a full sample or vice versa, and (b) incorporating data from other tables. (Not necessary to implement all of that at the moment.)

riastradh-probcomp avatar Nov 18 '16 17:11 riastradh-probcomp