sklearn-onnx
sklearn-onnx copied to clipboard
How to export sklearn_pandas based DataFrameMapper pipeline in ONNX format?
I am trying to export scikit-learn pipeline-based model in ONNX format, the only difference is I am not using latest version of scikit-learn and also using sklearn_pandas to create the pipeline. below is the sample code with error message. I have read the message its clear that <class 'sklearn_pandas.dataframe_mapper.DataFrameMapper'>' is not supported, is there any way of achieving the same.
scikit-learn==0.20.3
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from sklearn_pandas import DataFrameMapper
from sklearn import preprocessing
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
import pandas as pd
iris = load_iris()
dataX=pd.DataFrame(iris.data,columns=['sepal_length','sepal_width','petal_length','petal_width'])
dataY=pd.DataFrame(iris.target,columns=['target'])
X_train, X_test, y_train, y_test = train_test_split(dataX, dataY)
clr = RandomForestClassifier()
# clr.fit(X_train, y_train)
mapper=[(['sepal_length'],[preprocessing.Imputer(strategy="mean")]),
(['sepal_width'],[preprocessing.Imputer(strategy="mean")]),
(['petal_length'],[preprocessing.Imputer(strategy="mean")]),
(['petal_width'],[preprocessing.Imputer(strategy="mean")])]
mapper1=DataFrameMapper(mapper)
pipelineList=[('feature_mapper',mapper1),('clr',clr)]
pipelineModel=Pipeline(pipelineList)
pipelineModel.fit(X_train, y_train)
initial_type=[]
for i in mapper:
initial_type.append(('float_input', FloatTensorType([None, 1])))
onx = convert_sklearn(pipelineModel, initial_types=initial_type)
error message:
MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn_pandas.dataframe_mapper.DataFrameMapper'>'. It usually means the pipeline being converted contains a transformer or a predictor with no corresponding converter implemented in sklearn-onnx. If the converted is implemented in another library (ie: onnxmltools), you need to register the converted so that it can be used by sklearn-onnx (function update_registered_converter). If the model is not yet covered by sklearn-onnx, you may raise an issue to https://github.com/onnx/sklearn-onnx/issues to get the converter implemented or even contribute to the project. If the model is a custom model, a new converter must be implemented. Examples can be found in the gallery.
The converser of a pipeline requires that skl2onnx knows a converter for every piece in it. The library only contains converter for scikit-learn objects. Is there a way to replace DataFrameMapper by a ColumnTransformer?
Yes, this can be fairly achieved by using the ColumnTransformer of the latest scikit-learn package 0.23.x, but I just wanted to confirm if I can do it using the above-mentioned code.
A parser, a converter, a shape calculator need to be implemented to support DataFrameMapper. I'd prefer to do it only if ColumnTransformer does not support what you need.
Closing the issue, feel free to reopen it.