sklearn-onnx
sklearn-onnx copied to clipboard
Conversion of PyMorphy2 preprocessing to ONNX
Hello. Could you help with Conversion of PyMorphy2 preprocessing to ONNX, please? I've created custom class with method that does some text preprocessing (using re.sub()) and then lemmatizes the text using PyMorphy2. I use this custom class in TfidfVectorizer as preprocessor parameter. When I try to convert pipeline with this TfidfVectorizer I get NotImplementedError: Custom preprocessor cannot be converted into ONNX. Could you help, how I can convert this custom preprocessor to ONNX?
You may look at http://onnx.ai/sklearn-onnx/auto_tutorial/plot_icustom_converter.html to see how to implement a custom converter. However, the support for strings is very limited in standard ONNX. It only supports tokenization with regular expressions. You may need operators available in onnxruntime-extensions: https://github.com/microsoft/onnxruntime-extensions.
@kos345 is this problem still block you? if not, we will close this issue after few days.