mindsdb
mindsdb copied to clipboard
[CUSTOMER REQUEST] Enforce specific time granularity on order_by column
Checklist:
- [ ] 1) [MindsDB] should always convert order_by columns to
pd.datetime
- [ ] 2) [mindsdb_sql] Add
SAMPLE BY
support to enforce specific time intervals - [ ] 3) [mindsdb_sql] Robust support for aggregation when selecting columns (e.g.
AVG(prices), LATEST(ID)
) for deduping. - [ ] 4) [MindsDB] Create views using the above.
Example
Create a view that samples on five minute intervals, aggregating the other columns with AVG
and LAST
:
CREATE VIEW mytrainingdata FROM (
SELECT time, AVG(col1), LAST(col2) FROM (
(SELECT * FROM shopify_integration.orders)
)
SAMPLE BY 5m
)
Train a predictor using this view:
CREATE PREDICTOR mindsdb.shopify_model
FROM views.mytrainingdata
PREDICT col1
ORDER BY time
GROUP BY col2
WINDOW 10
HORIZON 5;
Note
Keeping the Lightwood issues open for now, as we're still not 100% decided on feature design and scope.
Meeting notes
-
Size could be an issue
-
LAST instead of LATEST
-
Could be a problem
-
Some way to handle grouping, e.g. GROUPBY(col2)
-
Storing views: not necessary? We just store and re-run the query. Could be slow at prediction time (would have to recreate view). Possibly store in a central postgres database, same place as predictors etc
-
Meeting Mon to discuss further
Was discussing this one with @Ricram2 just now. An update:
We are prioritizing doing this natively in the data source whenever possible. If not, we will have some subset of the above implemented as a function in the SQL feature engineering
project.
Closing. All originally mentioned fixes have either been implemented or are part of the roadmap (FESQL project).