ibis-ml icon indicating copy to clipboard operation
ibis-ml copied to clipboard

feat: preprocessing transformation priorities

Open jitingxu1 opened this issue 11 months ago • 2 comments

Building upon the deliverables outlined in issue #19, the objective is to enhance the coverage of ibisml machine learning preprocessing transformations, prioritizing key areas for improvement.

Please share your favorite ML transformation for your daily ML tasks and provide additional context as to why you find it particularly useful.

Assumption

  • Raw feature creation is done using ibis
  • tabular data

Priority definition:

  • P0: Essential tasks vital for the model development, Essential before our initial release.

  • P1: Desirable tasks that can enhance the model, based on feedback and further optimization.

  • P2: Additional tasks aimed at improving the model, based on feedback and further optimization.

Priorities

Preprocessing Module Ibis-ml Step sklearn Priority Status Note Model Needed
Encoding CatgoricalEncode OrdinalEncoder P0 Done
Encoding CountEncode P1 Done
Feature Engineering CreatePolynomialFeatures PolynomialFeatures P0 Done
Non-linear Transformation Math Transformation (log, sqrt,) P1 Done ibis
Standardization and Scaling ScaleStandard StandardScaler P0 Done KNN, MLPBased, SVM
Encoding TargetEncode TargetEncoder P0 Done
Feature Reduction DropZeroVariance VarianceThreshold P0 Done
Imputing HandleUnivariateOutliers SimpleImputer P0 Done
Feature Engineering ratio variable creation P0 Done ibis
Discretition DiscretizeKBins KBinsDiscretizer P0 Done
Discretition Feature binarization Binarizer P1 Done
Standardization and Scaling ScaleMinMax MinMaxScaler P0 Done KNN, MLPBased, SVM
Custom Transformer Custom transform FunctionTransformer P0 Done
Encoding OneHotEncode OneHotEncoder P0 Done
Imputing Outlier - Impute and capping P0 Done Log/Linear Reg
Feature Reduction Continuous Target Mutual Info P1 Not started
Feature Reduction Discrete Target Mutual information P1 Not started
Feature Engineering - Text Count Transfomer CountVectorizer P2 Not started
Feature Engineering - Text TFIDF Transformer TfidfTransformer P2 Not started
Encoding label binarizer LabelBinarizer P2 Not started
Encoding label encode LabelEncoder P2 Not started
Standardization and Scaling MaxAbsScaler MaxAbsScaler P2 Not started
Standardization and Scaling RobustScaler RobustScaler P1 Not started KNN, MLPBased, SVM
Imputing Missing value - Nearest Neighbor KNNImputer P1 Not started Doable
Non-linear Transformation QuantileTransformer QuantileTransformer P1 Not started
Non-linear Transformation Inverse and Logit transformation P2 Not started
Imputing Missing value - Linear reg P1 Not started Not Support
Imputing Missing value - bagged trees P1 Not started Not Support
Feature Reduction Filter col with missing rate threshold P1 Not started
Feature Reduction Filter Feature by high correlation P2 Not started Doable
Non-linear Transformation PowerTransformer PowerTransformer P1 Not started MLPBased, SVM
Feature Reduction PCA P1 Not started Not Support
Imputing Missing Value - rolling window Imputing P2 Not started
Feature Engineering Spline transformer SplineTransformer P1 Not started

Reference:

jitingxu1 avatar Mar 20 '24 02:03 jitingxu1