sklearn-onnx icon indicating copy to clipboard operation
sklearn-onnx copied to clipboard

How to convert custom pipeline (categorical get_dummies) with convert_coreml?

Open gitDawn opened this issue 4 years ago • 1 comments

I'm trying to save a custom sklearn pipeline as onnx model, but I'm getting errors in the process.

sample code:

from sklearn.preprocessing import OneHotEncoder
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline

from sklearn import svm
from winmltools import convert_coreml

import copy
from IPython.display import display
# https://github.com/pandas-dev/pandas/issues/8918

class MyEncoder(TransformerMixin):

    def __init__(self, columns=None):
        self.columns = columns

    def transform(self, X, y=None, **kwargs):
        return pd.get_dummies(X, dtype=np.float, columns=['ID'])

    def fit(self, X, y=None, **kwargs):
        return self

# data
X = pd.DataFrame([[100, 1.1, 3.1], [200, 4.1, 5.1], [100, 4.1, 2.1]], columns=['ID', 'X1', 'X2'])
Y = pd.Series([3, 2, 4])

# check transform
df = MyEncoder().transform(X)
display(df)

# create pipeline
pipe = Pipeline( steps=[('categorical', MyEncoder()), ('classifier', svm.SVR())] )
print(type(pipe), MyEncoder().transform(X).dtypes, '\n')

# prepare models
svm_toy  = svm.SVR()
svm_toy.fit(X,Y)
pipe_toy = copy.deepcopy(pipe).fit(X, Y)

# save onnx

# no problem here
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ] 
onx = convert_sklearn(svm_toy, initial_types=initial_type  )

# something goes wrong...
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ] 
onx = convert_sklearn(pipe_toy, initial_types=initial_type  )

The simple conversion goes well:

# no problem here
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ] 
onx = convert_sklearn(svm_toy, initial_types=initial_type  )

But the pipeline conversion fails:

# something goes wrong...
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ] 
onx = convert_sklearn(pipe_toy, initial_types=initial_type  )

with the following error:

MissingShapeCalculator: Unable to find a shape calculator for type ''.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.

Am I missing something with the customized pipeline and the get_dummies?

gitDawn avatar Jul 08 '20 04:07 gitDawn

You are using a custom operator and sklearn-onnx needs to know how to convert every piece of a pipeline to convert the whole pipeline. This error message tells that there is no associated converter to your custom op. You can replace your operator by one from scikit-learn or implement the converter for your operator. You will find an example here: https://github.com/xadupre/onnxcustom/blob/master/examples/plot_icustom_converter.py. It is one example of the tutorial I'm been working on and currently being rendered here: http://www.xavierdupre.fr/app/onnxcustom/helpsphinx/tutorial.html.

xadupre avatar Jul 15 '20 13:07 xadupre