bayeslite
bayeslite copied to clipboard
Implement INSERT INTO for metamodel (i.e. incorporating rows)
On June 28, a partial implementation of INSERT INTO
was (rightfully) nixed: 5bf3d2d1ec3f71518f8ad27e6100feb3419c73e5
The architecture now with population p
and table t
suggests the following workflow.
CREATE TABLE t FROM 'foo.csv' -- create table from .csv file
CREATE POPULATION p FOR t WITH SCHEMA ( GUESS (*) ); -- create population for `t`.
CREATE METAMODEL m FOR p; -- create metamdoel
INITIALIZE 10 MODELS FOR m; -- all existing data in t incorporated into all 10 gpms
ANALYZE m FOR 10 ITERATION; -- analysis
INSERT INTO t SELECT * FROM q -- user inserts more rows from q into the base table.
-- At this point, the metamodel `m` has no idea of the new rows. We have used
-- a workflow up to this point for performing analysis tasks where `q` contains the held-out
-- dataset.
-- New query informing all metamodels of `p` to rescan `t` and invoke gpm.incorporate
-- on the new rows. Actual wording can be changed.
ALTER POPULATION p RETRIEVE ROWS FROM BASE TABLE;
Looks reasonable. As for the specific wording, we could do something like
ALTER POPULATION p RESAMPLE[(n)] [FROM t]
which would leave room for (a) changing a subsample to a full sample or vice versa, and (b) incorporating data from other tables. (Not necessary to implement all of that at the moment.)