feature_engine icon indicating copy to clipboard operation
feature_engine copied to clipboard

Feature engineering package with sklearn like functionality

Results 138 feature_engine issues
Sort by recently updated
recently updated
newest added

**Is your feature request related to a problem? Please describe.** Methods for _data imputation_ and _data generation_. **Describe the solution you'd like** 1. It would be a good idea to...

**Is your feature request related to a problem? Please describe.** Currently `SmartCorrelatedFeatures` takes correlation measures that have a similar range (between -1 and +1) and select features by a fixed...

new transformer

**Is your feature request related to a problem? Please describe.** Aggregating variables by a single or multiple category is a simple task ``` df.groupby('cat')["num_var"].transform("mean") ``` however to make it compatible...

new transformer

**Is your feature request related to a problem? Please describe.** LightGBM has the option to return the predicted decision tree leaf for every model. From the [documentation](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html#lightgbm.LGBMRegressor.predict) > If pred_leaf=True,...

new transformer

Some sklearn transformers have an extra parameter in fit, the sample_weight, to tackle imbalanced datasets. Should we make this part of feature_engine? Example: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html Need to think if it is...

new transformer

I did not check papers about it, but in one project, I wanted to use OHE and ordinal encoding to take top categories but not by frequency, at that time,...

new transformer

Within the examples/selection folder, we need to add 1 notebook per new transformer showcasing the functionality of each. - We need to find suitable data sets, 1 for regression and...

jupyter notebook
priority

Your library is very cool, but do you have information about performance? For example, how long will it take to _MeanMedianImputer_ process a one million rows dataset? Can I run...

code quality
priority