sqlflow icon indicating copy to clipboard operation
sqlflow copied to clipboard

Roadmap of feature engineering in COLUMN clause.

Open brightcoder01 opened this issue 5 years ago • 0 comments

Refactor

  • [ ] Port the existed feature derivation functionality into the new infrastructure. #2568

Cover multiple frameworks

  • [ ] PyTorch: New COLUMN clause syntax can describe the data transformation process of PyTorch. Consistency between training and online inference. Investigation: #2276 #2399

  • [ ] XGBoost: Make sure the transform implementation can be adapted into the new column syntax. Consistency between training and online inference. Investigation: #2190

  • [ ] TensorFlow: The data transformation logic is built upon preprocessing layers.

  • [ ] Data Analysis stage. User experience: Users can write the transform function without full parameters in COLUMN clause. The parameters can be calculated from automatic data analysis.

  • [ ] New COLUMN Syntax = The transform functions + DENSE/SPARSE. User switch to the final new COLUMN syntax.

Summarize a data transformation library. It provide a unified set of transform API and have different implementations for various frameworks.

Automated Feature Engineering

  • [ ] Auto complete the transform logic based on the simplified COLUMN clause. User experience: User can write a SQLFlow statement with a simpler COLUMN clause. For example: Embedding(cat_1, 8) can be derived to Embedding(HASH(cat_1), 8) or Embedding(VOCABULARIZE(cat_1), 8)

  • [ ] SQLFlow can automatically build the feature engineering logic according to the metadata from source data and model from scratch. User experience: User can write a SQLFlow statement without COLUMN clause for model training.

brightcoder01 avatar Jul 23 '20 07:07 brightcoder01