scikit-lego
scikit-lego copied to clipboard
Selectors : allow results in empty dataframe
Before working on a large PR, please check with @koaning or @MBrouns that they agree with the direction of the PR. This discussion should take place in a Github issue before working on the PR, unless it's a minor change like spelling in the docs.
Description
Consider you want to build a semi-auto Pipeline. So, the pipeline may looks like:
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import OneHotEncoder, StandardScaler
transformer = Pipeline([
('features', FeatureUnion(n_jobs=1, transformer_list=[
# Part 1
('boolean', Pipeline([
('selector', PandasTypeSelector(include='bool')),
])), # booleans close
('numericals', Pipeline([
('selector', PandasTypeSelector(include='number')),
('scaler', StandardScaler()),
('add_pca', FeatureUnion([
('orig', IdentityTransformer()),
('pca', PCA(2))
]))
])), # numericals close
# Part 2
('categoricals', Pipeline([
('selector', PandasTypeSelector(include='category')),
('labeler', StringIndexer()),
('encoder', OneHotEncoder(handle_unknown='ignore')),
])) # categoricals close
])), # features close
]) # pipeline close
There may be boolean, numericals and categoricals variables , but may not exists erther . Current behaviour is raise Exception when a Selector return empty DataFrame , I think we can expose a parameter let user choose .
Fixes # (issue)
Type of change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
Checklist:
- [ ] My code follows the style guidelines (flake8)
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation (also to the readme.md)
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] I have added tests to check whether the new feature adheres to the sklearn convention
- [ ] New and existing unit tests pass locally with my changes
If you feel your PR is ready for a review, ping @koaning or @mbrouns.