sklearn-onnx icon indicating copy to clipboard operation
sklearn-onnx copied to clipboard

Conversion of PyMorphy2 preprocessing to ONNX

Open kos345 opened this issue 2 years ago • 2 comments

Hello. Could you help with Conversion of PyMorphy2 preprocessing to ONNX, please? I've created custom class with method that does some text preprocessing (using re.sub()) and then lemmatizes the text using PyMorphy2. I use this custom class in TfidfVectorizer as preprocessor parameter. When I try to convert pipeline with this TfidfVectorizer I get NotImplementedError: Custom preprocessor cannot be converted into ONNX. Could you help, how I can convert this custom preprocessor to ONNX?

kos345 avatar Mar 09 '22 11:03 kos345

You may look at http://onnx.ai/sklearn-onnx/auto_tutorial/plot_icustom_converter.html to see how to implement a custom converter. However, the support for strings is very limited in standard ONNX. It only supports tokenization with regular expressions. You may need operators available in onnxruntime-extensions: https://github.com/microsoft/onnxruntime-extensions.

xadupre avatar Mar 28 '22 16:03 xadupre

@kos345 is this problem still block you? if not, we will close this issue after few days.

xiaowuhu avatar Jun 14 '22 05:06 xiaowuhu