mindsdb icon indicating copy to clipboard operation
mindsdb copied to clipboard

[CUSTOMER REQUEST] Enforce specific time granularity on order_by column

Open paxcema opened this issue 2 years ago • 2 comments

Checklist:

  • [ ] 1) [MindsDB] should always convert order_by columns to pd.datetime
  • [ ] 2) [mindsdb_sql] Add SAMPLE BY support to enforce specific time intervals
  • [ ] 3) [mindsdb_sql] Robust support for aggregation when selecting columns (e.g. AVG(prices), LATEST(ID)) for deduping.
  • [ ] 4) [MindsDB] Create views using the above.

Example

Create a view that samples on five minute intervals, aggregating the other columns with AVG and LAST:

CREATE VIEW mytrainingdata FROM (
   SELECT time, AVG(col1), LAST(col2) FROM (
         (SELECT * FROM shopify_integration.orders)
    )
    SAMPLE BY 5m
)

Train a predictor using this view:

CREATE PREDICTOR mindsdb.shopify_model
FROM views.mytrainingdata
PREDICT col1
ORDER BY time
GROUP BY col2
WINDOW 10
HORIZON 5;

Note

Keeping the Lightwood issues open for now, as we're still not 100% decided on feature design and scope.

paxcema avatar Jul 29 '22 17:07 paxcema

Meeting notes

  • Size could be an issue

  • LAST instead of LATEST

  • Could be a problem

  • Some way to handle grouping, e.g. GROUPBY(col2)

  • Storing views: not necessary? We just store and re-run the query. Could be slow at prediction time (would have to recreate view). Possibly store in a central postgres database, same place as predictors etc

  • Meeting Mon to discuss further

tomhuds avatar Aug 12 '22 14:08 tomhuds

Was discussing this one with @Ricram2 just now. An update:

We are prioritizing doing this natively in the data source whenever possible. If not, we will have some subset of the above implemented as a function in the SQL feature engineering project.

paxcema avatar Nov 04 '22 17:11 paxcema

Closing. All originally mentioned fixes have either been implemented or are part of the roadmap (FESQL project).

paxcema avatar Dec 16 '22 18:12 paxcema