sklearn-onnx
sklearn-onnx copied to clipboard
How to convert custom pipeline (categorical get_dummies) with convert_coreml?
I'm trying to save a custom sklearn pipeline as onnx model, but I'm getting errors in the process.
sample code:
from sklearn.preprocessing import OneHotEncoder
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn import svm
from winmltools import convert_coreml
import copy
from IPython.display import display
# https://github.com/pandas-dev/pandas/issues/8918
class MyEncoder(TransformerMixin):
def __init__(self, columns=None):
self.columns = columns
def transform(self, X, y=None, **kwargs):
return pd.get_dummies(X, dtype=np.float, columns=['ID'])
def fit(self, X, y=None, **kwargs):
return self
# data
X = pd.DataFrame([[100, 1.1, 3.1], [200, 4.1, 5.1], [100, 4.1, 2.1]], columns=['ID', 'X1', 'X2'])
Y = pd.Series([3, 2, 4])
# check transform
df = MyEncoder().transform(X)
display(df)
# create pipeline
pipe = Pipeline( steps=[('categorical', MyEncoder()), ('classifier', svm.SVR())] )
print(type(pipe), MyEncoder().transform(X).dtypes, '\n')
# prepare models
svm_toy = svm.SVR()
svm_toy.fit(X,Y)
pipe_toy = copy.deepcopy(pipe).fit(X, Y)
# save onnx
# no problem here
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ]
onx = convert_sklearn(svm_toy, initial_types=initial_type )
# something goes wrong...
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ]
onx = convert_sklearn(pipe_toy, initial_types=initial_type )
The simple conversion goes well:
# no problem here
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ]
onx = convert_sklearn(svm_toy, initial_types=initial_type )
But the pipeline conversion fails:
# something goes wrong...
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ]
onx = convert_sklearn(pipe_toy, initial_types=initial_type )
with the following error:
MissingShapeCalculator: Unable to find a shape calculator for type ''.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.
Am I missing something with the customized pipeline and the get_dummies
?
You are using a custom operator and sklearn-onnx needs to know how to convert every piece of a pipeline to convert the whole pipeline. This error message tells that there is no associated converter to your custom op. You can replace your operator by one from scikit-learn or implement the converter for your operator. You will find an example here: https://github.com/xadupre/onnxcustom/blob/master/examples/plot_icustom_converter.py. It is one example of the tutorial I'm been working on and currently being rendered here: http://www.xavierdupre.fr/app/onnxcustom/helpsphinx/tutorial.html.