sklearn-onnx icon indicating copy to clipboard operation
sklearn-onnx copied to clipboard

(OneVsOneClassifier) Not able to convert sklearn model using pipeline to ONNX format for real time inferencing

Open pratikchhapolika opened this issue 3 years ago • 21 comments

It is a multi-class classification model with sklearn.

I am using OneVsOneClassifier model to train and predict 150 intents. Its a multi-class classification problem.

Data:

text          intents

text1         int1
text2         int2

I convert these intents in labels using:

le = LabelEncoder()
y_train = le.fit_transform(y_train)
y_test = le.fit_transform(y_test)

Expectation:

Without changing the training pipeline or parameters, note the inference time. Currently, it's slow, ~1second for 1 inference. So to convert pipeline to ONNX format and then use for inferencing on 1 example.

Code:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.compose import ColumnTransformer
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC,LinearSVC

def create_pipe(clf):
    
    # Each pipeline uses the same column transformer.  
    column_trans = ColumnTransformer(
            [('Text', TfidfVectorizer(), 'text')
             ],
            remainder='drop') 
    
    pipeline = Pipeline([('prep',column_trans),                     
                         ('clf', clf)])
     
    return pipeline

def fit_and_print(pipeline):
    
    pipeline.fit(X_train, y_train)
    y_pred = pipeline.predict(X_test)

    print(metrics.classification_report(y_test, y_pred, 
                                        target_names=le.classes_, 
                                        digits=3))
clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced'))
pipeline = create_pipe(clf)
%time fit_and_print(pipeline)

# convert input to df

def create_test_data(x):
    d = {'text' : x}
    df = pd.DataFrame(d, index=[0])
    return df

revs=[]
for idx in [948, 5717, 458]:
     cur = test.loc[idx, 'text']
     revs.append(cur)
print(revs) 

revs=sam['text'].values

%%time
for rev in revs:
    c_res = pipeline.predict(create_test_data(rev))
    print(rev, '=', labels[c_res[0]])

ONNX conversion code

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType, StringTensorType

initial_type = [('UTTERANCE', StringTensorType([None, 2]))]
model_onnx = convert_sklearn(pipeline, initial_types=initial_type)

Error

MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.multiclass.OneVsOneClassifier'>'.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.

How to resolve this ? Also how to do prediction after converting to ONNX format?

pratikchhapolika avatar Jul 28 '22 13:07 pratikchhapolika

cross-referencing: https://stackoverflow.com/q/73153452/10495893

bmreiniger avatar Jul 28 '22 19:07 bmreiniger

@bmreiniger any help.

pratikchhapolika avatar Jul 29 '22 15:07 pratikchhapolika

The converter for OneVsOneClassifier has not been implemented yet. It should not be too complicated to do. In the meantime, OneVsRestClassifier has a converter.

xadupre avatar Aug 01 '22 15:08 xadupre

The converter for OneVsOneClassifier has not been implemented yet. It should not be too complicated to do. In the meantime, OneVsRestClassifier has a converter.

But in the document OneVsOneClassifier was mentioned. Strange though. When can we expect to get it implemented?

pratikchhapolika avatar Aug 01 '22 17:08 pratikchhapolika

Hi @xadupre any help on this?

pratikchhapolika avatar Aug 08 '22 06:08 pratikchhapolika

@pratikchhapolika I'm working on this converter now. Can you please share me some of your data (X_train, y_train, X_test, y_test, etc.) so I can do a fully testing?

xiaowuhu avatar Aug 17 '22 07:08 xiaowuhu

@pratikchhapolika I'm working on this converter now. Can you please share me some of your data (X_train, y_train, X_test, y_test, etc.) so I can do a fully testing?

sure. I will share it by end of day today.

pratikchhapolika avatar Aug 17 '22 09:08 pratikchhapolika

@pratikchhapolika I'm working on this converter now. Can you please share me some of your data (X_train, y_train, X_test, y_test, etc.) so I can do a fully testing?

@xiaowuhu Please find the data-set

pratikchhapolika avatar Aug 17 '22 11:08 pratikchhapolika

@pratikchhapolika I'm working on this converter now. Can you please share me some of your data (X_train, y_train, X_test, y_test, etc.) so I can do a fully testing?

@xiaowuhu

I am using OneVsOneClassifier with ColumnTransformer([('Text', TfidfVectorizer(), 'UTTERANCE')],remainder='drop')

def create_pipe(clf):
    
    # Each pipeline uses the same column transformer.  
    column_trans = ColumnTransformer([('Text', TfidfVectorizer(), 'UTTERANCE')],remainder='drop') 
    
    pipeline = Pipeline([('prep',column_trans),                     
                         ('clf', clf)])
     
    return pipeline
clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced'))
pipeline = create_pipe(clf)
def fit_and_print(pipeline):
    
    pipeline.fit(X_train, y_train)
    y_pred = pipeline.predict(X_test)

    print(metrics.classification_report(y_test, y_pred, 
                                        target_names=le.classes_, 
                                        digits=3))

%time fit_and_print(pipeline)

pratikchhapolika avatar Aug 17 '22 11:08 pratikchhapolika

got it. will update you when I am ready.

xiaowuhu avatar Aug 22 '22 23:08 xiaowuhu

got it. will update you when I am ready.

sure @xiaowuhu Thank you!

pratikchhapolika avatar Aug 23 '22 11:08 pratikchhapolika

@pratikchhapolika We finished the OvO converter, please pip install from GitHub to have a try. We didn't release it to PiPy yet.

xiaowuhu avatar Sep 05 '22 00:09 xiaowuhu

@pratikchhapolika We finished the OvO converter, please pip install from GitHub to have a try. We didn't release it to PiPy yet.

@xiaowuhu Could you please show, how can I use it in my use-case or example above? Could you please post the code here?

pratikchhapolika avatar Sep 06 '22 10:09 pratikchhapolika

@xiaowuhu Also please let us know how to install this particular ovo converter and use it.

pratikchhapolika avatar Sep 07 '22 04:09 pratikchhapolika

@xiaowuhu Also please let us know how to install this particular ovo converter and use it.

pip install git+https://github.com/onnx/sklearn-onnx

if you can uninstall the previous package coming from PyPI, it will be more clear to check the above installation.

xiaowuhu avatar Sep 07 '22 05:09 xiaowuhu

@pratikchhapolika We finished the OvO converter, please pip install from GitHub to have a try. We didn't release it to PiPy yet.

@xiaowuhu Could you please show, how can I use it in my use-case or example above? Could you please post the code here?

Please help with this also @xiaowuhu

pratikchhapolika avatar Sep 07 '22 17:09 pratikchhapolika

@pratikchhapolika sorry I cannot make your sample (below code) working even in sk-learn, so cannot test the converter.

clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced'))
column_trans = ColumnTransformer([('Text', TfidfVectorizer(), 'UTTERANCE')], remainder='drop')   
pipeline = Pipeline([('prep', column_trans), ('clf', clf)])   
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
print(y_pred)
return clf

we are using this test case: https://github.com/onnx/sklearn-onnx/blob/main/tests/test_sklearn_one_vs_one_classifier_converter.py

xiaowuhu avatar Sep 08 '22 07:09 xiaowuhu

al_types=initial_type)

@xiaowuhu Thank you. Let me try on my notebook and check. Meanwhile is it fine to keep this open?

pratikchhapolika avatar Sep 08 '22 08:09 pratikchhapolika

sure. when you are OK, please let me know.

xiaowuhu avatar Sep 08 '22 08:09 xiaowuhu

@pratikchhapolika

sorry to ask for the data format of X_train and others. Is it looks like pandas DataFrame:


or just a python list?

my goal is to make below code work:

clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced'))
column_trans = ColumnTransformer([('Text', TfidfVectorizer(), 'UTTERANCE')], remainder='drop')   
pipeline = Pipeline([('prep', column_trans), ('clf', clf)])   
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
print(y_pred)
return clf

Hi @xiaowuhu , could you please delete the X_train data from comment above. As it might violate some of my company's privacy.

pratikchhapolika avatar Sep 09 '22 03:09 pratikchhapolika

@pratikchhapolika done. please check.

xiaowuhu avatar Sep 09 '22 05:09 xiaowuhu