dbt-ml-preprocessing
dbt-ml-preprocessing copied to clipboard
Support ml-driven preprocessing (e.g. PCA) with SQL
I'd be interested to see an extension of this project to incorporate more ML-driven preprocessing into the pipelines. Thinking things such as PCA, K-means, Bayesian classifiers. Is this a direction you've already considered? Where do you see this fitting into the spectrum from "preprocessing" to "inference"?
Hi @matt-winkler , apologies for such a late response - I didn't have the right notification settings on this repo.
I did dabble in a bit of SQL driven ML a few years ago, e.g. a k-means POC here: https://github.com/jamesweakley/snowflake-ml/tree/master/k-means
Generally speaking, as long as there are not too many iterations over the data it can work well in a SQL engine.
I would consider it a different use case and hence a different dbt package. I also wonder if there'd be concerns around a potentially expensive operation being part of the DAG? That said, it appears to be how this package works for BigQuery.