(OneVsOneClassifier) Not able to convert sklearn model using pipeline to ONNX format for real time inferencing
It is a multi-class classification model with sklearn.
I am using OneVsOneClassifier model to train and predict 150 intents. Its a multi-class classification problem.
Data:
text intents
text1 int1
text2 int2
I convert these intents in labels using:
le = LabelEncoder()
y_train = le.fit_transform(y_train)
y_test = le.fit_transform(y_test)
Expectation:
Without changing the training pipeline or parameters, note the inference time. Currently, it's slow, ~1second for 1 inference. So to convert pipeline to ONNX format and then use for inferencing on 1 example.
Code:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.compose import ColumnTransformer
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC,LinearSVC
def create_pipe(clf):
# Each pipeline uses the same column transformer.
column_trans = ColumnTransformer(
[('Text', TfidfVectorizer(), 'text')
],
remainder='drop')
pipeline = Pipeline([('prep',column_trans),
('clf', clf)])
return pipeline
def fit_and_print(pipeline):
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
print(metrics.classification_report(y_test, y_pred,
target_names=le.classes_,
digits=3))
clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced'))
pipeline = create_pipe(clf)
%time fit_and_print(pipeline)
# convert input to df
def create_test_data(x):
d = {'text' : x}
df = pd.DataFrame(d, index=[0])
return df
revs=[]
for idx in [948, 5717, 458]:
cur = test.loc[idx, 'text']
revs.append(cur)
print(revs)
revs=sam['text'].values
%%time
for rev in revs:
c_res = pipeline.predict(create_test_data(rev))
print(rev, '=', labels[c_res[0]])
ONNX conversion code
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType, StringTensorType
initial_type = [('UTTERANCE', StringTensorType([None, 2]))]
model_onnx = convert_sklearn(pipeline, initial_types=initial_type)
Error
MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.multiclass.OneVsOneClassifier'>'.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.
How to resolve this ? Also how to do prediction after converting to ONNX format?
cross-referencing: https://stackoverflow.com/q/73153452/10495893
@bmreiniger any help.
The converter for OneVsOneClassifier has not been implemented yet. It should not be too complicated to do. In the meantime, OneVsRestClassifier has a converter.
The converter for OneVsOneClassifier has not been implemented yet. It should not be too complicated to do. In the meantime, OneVsRestClassifier has a converter.
But in the document OneVsOneClassifier was mentioned. Strange though. When can we expect to get it implemented?
Hi @xadupre any help on this?
@pratikchhapolika I'm working on this converter now. Can you please share me some of your data (X_train, y_train, X_test, y_test, etc.) so I can do a fully testing?
@pratikchhapolika I'm working on this converter now. Can you please share me some of your data (X_train, y_train, X_test, y_test, etc.) so I can do a fully testing?
sure. I will share it by end of day today.
@pratikchhapolika I'm working on this converter now. Can you please share me some of your data (X_train, y_train, X_test, y_test, etc.) so I can do a fully testing?
@xiaowuhu Please find the data-set
@pratikchhapolika I'm working on this converter now. Can you please share me some of your data (X_train, y_train, X_test, y_test, etc.) so I can do a fully testing?
@xiaowuhu
I am using OneVsOneClassifier with ColumnTransformer([('Text', TfidfVectorizer(), 'UTTERANCE')],remainder='drop')
def create_pipe(clf):
# Each pipeline uses the same column transformer.
column_trans = ColumnTransformer([('Text', TfidfVectorizer(), 'UTTERANCE')],remainder='drop')
pipeline = Pipeline([('prep',column_trans),
('clf', clf)])
return pipeline
clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced'))
pipeline = create_pipe(clf)
def fit_and_print(pipeline):
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
print(metrics.classification_report(y_test, y_pred,
target_names=le.classes_,
digits=3))
%time fit_and_print(pipeline)
got it. will update you when I am ready.
got it. will update you when I am ready.
sure @xiaowuhu Thank you!
@pratikchhapolika We finished the OvO converter, please pip install from GitHub to have a try. We didn't release it to PiPy yet.
@pratikchhapolika We finished the OvO converter, please pip install from GitHub to have a try. We didn't release it to PiPy yet.
@xiaowuhu Could you please show, how can I use it in my use-case or example above? Could you please post the code here?
@xiaowuhu Also please let us know how to install this particular ovo converter and use it.
@xiaowuhu Also please let us know how to install this particular
ovoconverter and use it.
pip install git+https://github.com/onnx/sklearn-onnx
if you can uninstall the previous package coming from PyPI, it will be more clear to check the above installation.
@pratikchhapolika We finished the OvO converter, please pip install from GitHub to have a try. We didn't release it to PiPy yet.
@xiaowuhu Could you please show, how can I use it in my use-case or example above? Could you please post the code here?
Please help with this also @xiaowuhu
@pratikchhapolika sorry I cannot make your sample (below code) working even in sk-learn, so cannot test the converter.
clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced'))
column_trans = ColumnTransformer([('Text', TfidfVectorizer(), 'UTTERANCE')], remainder='drop')
pipeline = Pipeline([('prep', column_trans), ('clf', clf)])
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
print(y_pred)
return clf
we are using this test case: https://github.com/onnx/sklearn-onnx/blob/main/tests/test_sklearn_one_vs_one_classifier_converter.py
al_types=initial_type)
@xiaowuhu Thank you. Let me try on my notebook and check. Meanwhile is it fine to keep this open?
sure. when you are OK, please let me know.
@pratikchhapolika
sorry to ask for the data format of X_train and others. Is it looks like pandas DataFrame:
or just a python list? my goal is to make below code work:clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced')) column_trans = ColumnTransformer([('Text', TfidfVectorizer(), 'UTTERANCE')], remainder='drop') pipeline = Pipeline([('prep', column_trans), ('clf', clf)]) pipeline.fit(X_train, y_train) y_pred = pipeline.predict(X_test) print(y_pred) return clf
Hi @xiaowuhu , could you please delete the X_train data from comment above. As it might violate some of my company's privacy.
@pratikchhapolika done. please check.