sklearn-onnx icon indicating copy to clipboard operation
sklearn-onnx copied to clipboard

How to export sklearn_pandas based DataFrameMapper pipeline in ONNX format?

Open sharmasw opened this issue 4 years ago • 3 comments

I am trying to export scikit-learn pipeline-based model in ONNX format, the only difference is I am not using latest version of scikit-learn and also using sklearn_pandas to create the pipeline. below is the sample code with error message. I have read the message its clear that <class 'sklearn_pandas.dataframe_mapper.DataFrameMapper'>' is not supported, is there any way of achieving the same.

scikit-learn==0.20.3

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from sklearn_pandas import DataFrameMapper
from sklearn import preprocessing
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
import pandas as pd 

iris = load_iris()
dataX=pd.DataFrame(iris.data,columns=['sepal_length','sepal_width','petal_length','petal_width'])
dataY=pd.DataFrame(iris.target,columns=['target'])

X_train, X_test, y_train, y_test = train_test_split(dataX, dataY)
clr = RandomForestClassifier()
# clr.fit(X_train, y_train)

mapper=[(['sepal_length'],[preprocessing.Imputer(strategy="mean")]),
 (['sepal_width'],[preprocessing.Imputer(strategy="mean")]),
 (['petal_length'],[preprocessing.Imputer(strategy="mean")]),
 (['petal_width'],[preprocessing.Imputer(strategy="mean")])]

mapper1=DataFrameMapper(mapper)

pipelineList=[('feature_mapper',mapper1),('clr',clr)]
pipelineModel=Pipeline(pipelineList)

pipelineModel.fit(X_train, y_train)

initial_type=[]
for i in mapper:
    initial_type.append(('float_input', FloatTensorType([None, 1])))

onx = convert_sklearn(pipelineModel, initial_types=initial_type)

error message:

MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn_pandas.dataframe_mapper.DataFrameMapper'>'. It usually means the pipeline being converted contains a transformer or a predictor with no corresponding converter implemented in sklearn-onnx. If the converted is implemented in another library (ie: onnxmltools), you need to register the converted so that it can be used by sklearn-onnx (function update_registered_converter). If the model is not yet covered by sklearn-onnx, you may raise an issue to https://github.com/onnx/sklearn-onnx/issues to get the converter implemented or even contribute to the project. If the model is a custom model, a new converter must be implemented. Examples can be found in the gallery.

sharmasw avatar Jun 08 '20 09:06 sharmasw

The converser of a pipeline requires that skl2onnx knows a converter for every piece in it. The library only contains converter for scikit-learn objects. Is there a way to replace DataFrameMapper by a ColumnTransformer?

xadupre avatar Jun 09 '20 09:06 xadupre

Yes, this can be fairly achieved by using the ColumnTransformer of the latest scikit-learn package 0.23.x, but I just wanted to confirm if I can do it using the above-mentioned code.

sharmasw avatar Jun 09 '20 10:06 sharmasw

A parser, a converter, a shape calculator need to be implemented to support DataFrameMapper. I'd prefer to do it only if ColumnTransformer does not support what you need.

xadupre avatar Jun 10 '20 09:06 xadupre

Closing the issue, feel free to reopen it.

xadupre avatar Nov 24 '22 13:11 xadupre