sqlflow
sqlflow copied to clipboard
Roadmap of feature engineering in COLUMN clause.
Refactor
- [ ] Port the existed feature derivation functionality into the new infrastructure. #2568
Cover multiple frameworks
-
[ ] PyTorch: New COLUMN clause syntax can describe the data transformation process of PyTorch. Consistency between training and online inference. Investigation: #2276 #2399
-
[ ] XGBoost: Make sure the transform implementation can be adapted into the new column syntax. Consistency between training and online inference. Investigation: #2190
-
[ ] TensorFlow: The data transformation logic is built upon preprocessing layers.
-
[ ] Data Analysis stage. User experience: Users can write the transform function without full parameters in COLUMN clause. The parameters can be calculated from automatic data analysis.
-
[ ] New COLUMN Syntax = The transform functions + DENSE/SPARSE. User switch to the final new COLUMN syntax.
Summarize a data transformation library. It provide a unified set of transform API and have different implementations for various frameworks.
Automated Feature Engineering
-
[ ] Auto complete the transform logic based on the simplified COLUMN clause. User experience: User can write a SQLFlow statement with a simpler COLUMN clause. For example: Embedding(cat_1, 8) can be derived to Embedding(HASH(cat_1), 8) or Embedding(VOCABULARIZE(cat_1), 8)
-
[ ] SQLFlow can automatically build the feature engineering logic according to the metadata from source data and model from scratch. User experience: User can write a SQLFlow statement without COLUMN clause for model training.