ibis-ml
ibis-ml copied to clipboard
feat: preprocessing transformation priorities
Building upon the deliverables outlined in issue #19, the objective is to enhance the coverage of ibisml machine learning preprocessing transformations, prioritizing key areas for improvement.
Please share your favorite ML transformation for your daily ML tasks and provide additional context as to why you find it particularly useful.
Assumption
- Raw feature creation is done using ibis
- tabular data
Priority definition:
-
P0: Essential tasks vital for the model development, Essential before our initial release.
-
P1: Desirable tasks that can enhance the model, based on feedback and further optimization.
-
P2: Additional tasks aimed at improving the model, based on feedback and further optimization.
Priorities
Preprocessing Module | Ibis-ml Step | sklearn | Priority | Status | Note | Model Needed |
---|---|---|---|---|---|---|
Encoding | CatgoricalEncode | OrdinalEncoder | P0 | Done | ||
Encoding | CountEncode | P1 | Done | |||
Feature Engineering | CreatePolynomialFeatures | PolynomialFeatures | P0 | Done | ||
Non-linear Transformation | Math Transformation (log, sqrt,) | P1 | Done | ibis | ||
Standardization and Scaling | ScaleStandard | StandardScaler | P0 | Done | KNN, MLPBased, SVM | |
Encoding | TargetEncode | TargetEncoder | P0 | Done | ||
Feature Reduction | DropZeroVariance | VarianceThreshold | P0 | Done | ||
Imputing | HandleUnivariateOutliers | SimpleImputer | P0 | Done | ||
Feature Engineering | ratio variable creation | P0 | Done | ibis | ||
Discretition | DiscretizeKBins | KBinsDiscretizer | P0 | Done | ||
Discretition | Feature binarization | Binarizer | P1 | Done | ||
Standardization and Scaling | ScaleMinMax | MinMaxScaler | P0 | Done | KNN, MLPBased, SVM | |
Custom Transformer | Custom transform | FunctionTransformer | P0 | Done | ||
Encoding | OneHotEncode | OneHotEncoder | P0 | Done | ||
Imputing | Outlier - Impute and capping | P0 | Done | Log/Linear Reg | ||
Feature Reduction | Continuous Target Mutual Info | P1 | Not started | |||
Feature Reduction | Discrete Target Mutual information | P1 | Not started | |||
Feature Engineering - Text | Count Transfomer | CountVectorizer | P2 | Not started | ||
Feature Engineering - Text | TFIDF Transformer | TfidfTransformer | P2 | Not started | ||
Encoding | label binarizer | LabelBinarizer | P2 | Not started | ||
Encoding | label encode | LabelEncoder | P2 | Not started | ||
Standardization and Scaling | MaxAbsScaler | MaxAbsScaler | P2 | Not started | ||
Standardization and Scaling | RobustScaler | RobustScaler | P1 | Not started | KNN, MLPBased, SVM | |
Imputing | Missing value - Nearest Neighbor | KNNImputer | P1 | Not started | Doable | |
Non-linear Transformation | QuantileTransformer | QuantileTransformer | P1 | Not started | ||
Non-linear Transformation | Inverse and Logit transformation | P2 | Not started | |||
Imputing | Missing value - Linear reg | P1 | Not started | Not Support | ||
Imputing | Missing value - bagged trees | P1 | Not started | Not Support | ||
Feature Reduction | Filter col with missing rate threshold | P1 | Not started | |||
Feature Reduction | Filter Feature by high correlation | P2 | Not started | Doable | ||
Non-linear Transformation | PowerTransformer | PowerTransformer | P1 | Not started | MLPBased, SVM | |
Feature Reduction | PCA | P1 | Not started | Not Support | ||
Imputing | Missing Value - rolling window Imputing | P2 | Not started | |||
Feature Engineering | Spline transformer | SplineTransformer | P1 | Not started |