hummingbird icon indicating copy to clipboard operation
hummingbird copied to clipboard

Feature request: extensibility framework

Open scivm opened this issue 4 years ago • 3 comments

We have sklearn pipeline operations such as changing columns to lowercase, extracting columns, sorting columns, string cleaner , NA filler, etc.

Hummingbird errors out on first one that lower case of column names (self._cols_to_lower(X)). I am assuming we need to implement converters for these ourselves. Is there some example for these simple cases?

Unable to find converter for model type <class 'pipeline_components.ColumnNameLowerizer'>. It usually means the pipeline being converted contains a transformer or a predictor with no corresponding converter implemented. Please fill an issue at https://github.com/microsoft/hummingbird.

scivm avatar Nov 24 '20 18:11 scivm

Hi @scivm ! I don't see ColumnNameLowerizer as an SKL operator. Is this a function that you wrote yourself and added to your SKL pipeline manually?

Can you please share some of your code that you used to generate the model? That will help us troubleshoot.

ksaur avatar Nov 24 '20 18:11 ksaur

Thanks. It is a simple custom function added to the skl pipeline that changes columns to lowercase. There is a series of similar standard operations such as extracting columns, sorting columns, string cleaner.

            model = = RandomForestClassifier()
            pipeline = Pipeline(
                [
                    ('column_lowerizer', pc.ColumnNameLowerizer()),
                    ('final_classifier', model)
                ]
            )
class ColumnNameLowerizer:
    """ Normalize all columnames to lower case
    """
    def transform(self, X):
        return self._cols_to_lower(X)

    def fit(self, X, y=None):
        return self._cols_to_lower(X)

    def fit_transform(self, X, y=None):
        return self._cols_to_lower(X)

    def _cols_to_lower(self, X):
        X.columns = [col.lower() for col in X.columns]
        return X

scivm avatar Nov 24 '20 19:11 scivm

Thanks for posting this! So far, we have focused only on strict onnx or sklearn pipelines.

This example highlights an important limitation in our setup. I'll change this to a feature request for us to add extensibility framework.

For now, the only way to make your code work is to either change your code to not need this additional pipeline step, or to implement a converter manually (which would involve also adding the steps to register the converter).

ksaur avatar Nov 24 '20 22:11 ksaur