sklearn-onnx icon indicating copy to clipboard operation
sklearn-onnx copied to clipboard

Add converter for CountVectorizer with "char_wb" analyzer

Open cppntn opened this issue 5 years ago • 1 comments

I've tried but this error occurred,

NotImplementedError: CountVectorizer cannot be converted, only tokenizer='word' is supported. You may raise an issue at https://github.com/onnx/sklearn-onnx/issues.

which led me here to open this issue

Thanks for your support

cppntn avatar Apr 28 '20 21:04 cppntn

Right now, I have no easy way to fix it. scikit-learn preprocesses the strings before extracting the characters and removes double spaces: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/text.py#L258. onnxruntime does not implement that behaviour. ONNX StringNormalizer only contains basic options: https://github.com/onnx/onnx/blob/master/docs/Operators.md.

xadupre avatar May 24 '20 14:05 xadupre