dbt-ml-preprocessing icon indicating copy to clipboard operation
dbt-ml-preprocessing copied to clipboard

Support ml-driven preprocessing (e.g. PCA) with SQL

Open matt-winkler opened this issue 3 years ago • 1 comments

I'd be interested to see an extension of this project to incorporate more ML-driven preprocessing into the pipelines. Thinking things such as PCA, K-means, Bayesian classifiers. Is this a direction you've already considered? Where do you see this fitting into the spectrum from "preprocessing" to "inference"?

matt-winkler avatar Apr 09 '21 17:04 matt-winkler

Hi @matt-winkler , apologies for such a late response - I didn't have the right notification settings on this repo.

I did dabble in a bit of SQL driven ML a few years ago, e.g. a k-means POC here: https://github.com/jamesweakley/snowflake-ml/tree/master/k-means

Generally speaking, as long as there are not too many iterations over the data it can work well in a SQL engine.

I would consider it a different use case and hence a different dbt package. I also wonder if there'd be concerns around a potentially expensive operation being part of the DAG? That said, it appears to be how this package works for BigQuery.

jamesweakley avatar Mar 25 '22 09:03 jamesweakley