feature_engine issues

Imputation and data generation methods request

2

**Is your feature request related to a problem? Please describe.** Methods for _data imputation_ and _data generation_. **Describe the solution you'd like** 1. It would be a good idea to...

papachristoumarios

feat: Custom threshold in SmartCorrelatedFeatures

9

**Is your feature request related to a problem? Please describe.** Currently `SmartCorrelatedFeatures` takes correlation measures that have a similar range (between -1 and +1) and select features by a fixed...

TremaMiguel

new transformer

feat: Group Transformer

7

**Is your feature request related to a problem? Please describe.** Aggregating variables by a single or multiple category is a simple task ``` df.groupby('cat')["num_var"].transform("mean") ``` however to make it compatible...

TremaMiguel

new transformer

feat: Gradient Boosting Tree predicted leaf as feature

4

**Is your feature request related to a problem? Please describe.** LightGBM has the option to return the predicted decision tree leaf for every model. From the [documentation](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html#lightgbm.LGBMRegressor.predict) > If pred_leaf=True,...

TremaMiguel

new transformer

automate linting with pre-commit

2

solegalli

code quality

should we make sample_weight part of fit?

Some sklearn transformers have an extra parameter in fit, the sample_weight, to tackle imbalanced datasets. Should we make this part of feature_engine? Example: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html Need to think if it is...

solegalli

new transformer

One hot encoder using another criteria instead of frequency in top categories

3

I did not check papers about it, but in one project, I wanted to use OHE and ordinal encoding to take top categories but not by frequency, at that time,...

indymnv

new transformer

1-2 minute videos showing how to use a class in youtube

Would these be useful?

solegalli

docs

add jupyter notebooks for feature selection

2

Within the examples/selection folder, we need to add 1 notebook per new transformer showcasing the functionality of each. - We need to find suitable data sets, 1 for regression and...

solegalli

jupyter notebook

priority

Question about performance

7

Your library is very cool, but do you have information about performance? For example, how long will it take to _MeanMedianImputer_ process a one million rows dataset? Can I run...

pgschr

code quality

priority

feature_engine
feature_engine copied to clipboard

Metadata

Imputation and data generation methods request

feat: Custom threshold in SmartCorrelatedFeatures

feat: Group Transformer

feat: Gradient Boosting Tree predicted leaf as feature

automate linting with pre-commit

should we make sample_weight part of fit?

One hot encoder using another criteria instead of frequency in top categories

1-2 minute videos showing how to use a class in youtube

add jupyter notebooks for feature selection

Question about performance

← Metadata

Owner

Metadata

feature_engine feature_engine copied to clipboard

Metadata

← Metadata

Owner

Metadata

feature_engine
feature_engine copied to clipboard