scikit-lego icon indicating copy to clipboard operation
scikit-lego copied to clipboard

Selectors : allow results in empty dataframe

Open eromoe opened this issue 2 years ago • 3 comments

Before working on a large PR, please check with @koaning or @MBrouns that they agree with the direction of the PR. This discussion should take place in a Github issue before working on the PR, unless it's a minor change like spelling in the docs.

Description

Consider you want to build a semi-auto Pipeline. So, the pipeline may looks like:


import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import OneHotEncoder, StandardScaler

transformer = Pipeline([
    ('features', FeatureUnion(n_jobs=1, transformer_list=[
        # Part 1
        ('boolean', Pipeline([
            ('selector', PandasTypeSelector(include='bool')),
        ])),  # booleans close
        
        ('numericals', Pipeline([
            ('selector', PandasTypeSelector(include='number')),
            ('scaler', StandardScaler()),
            ('add_pca', FeatureUnion([
                ('orig', IdentityTransformer()),
                ('pca', PCA(2))
            ]))
        ])),  # numericals close
        
        # Part 2
        ('categoricals', Pipeline([
            ('selector', PandasTypeSelector(include='category')),
            ('labeler', StringIndexer()),
            ('encoder', OneHotEncoder(handle_unknown='ignore')),
        ]))  # categoricals close
    
    ])),  # features close
])  # pipeline close

There may be boolean, numericals and categoricals variables , but may not exists erther . Current behaviour is raise Exception when a Selector return empty DataFrame , I think we can expose a parameter let user choose .

Fixes # (issue)

Type of change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • [ ] My code follows the style guidelines (flake8)
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation (also to the readme.md)
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added tests to check whether the new feature adheres to the sklearn convention
  • [ ] New and existing unit tests pass locally with my changes

If you feel your PR is ready for a review, ping @koaning or @mbrouns.

eromoe avatar Oct 31 '22 09:10 eromoe