NimbusML icon indicating copy to clipboard operation
NimbusML copied to clipboard

ONNX model for NgramFeaturizer doesnt output actual tokens

Open ganik opened this issue 5 years ago • 0 comments

Repro ` from nimbusml.datasets import get_dataset from nimbusml import FileDataStream from nimbusml.preprocessing import OnnxRunner from nimbusml.feature_extraction.text import NGramFeaturizer from nimbusml.feature_extraction.text.extractor import Ngram

path = get_dataset("wiki_detox_train").as_filepath() data = FileDataStream.read_csv(path, sep='\t')

transformer = NGramFeaturizer(word_feature_extractor=Ngram(), char_feature_extractor=None, text_case='None', keep_diacritics=True, keep_numbers=True, keep_punctuations=True, columns={ 'features': ['SentimentText']}) print(transformer.fit_transform(data)) transformer.export_to_onnx("test.onnx", 'com.microsoft.ml') onnx_runner = OnnxRunner(model_file="test.onnx") print(onnx_runner.fit_transform(data)) ` Output Sentiment SentimentText ... features.douchiest features.award. 0 1 ==RUDE== Dude, you are rude upload that carl p... ... 0.000000 0.000000 1 1 == OK! == IM GOING TO VANDALIZE WILD ONES WIK... ... 0.000000 0.000000 2 1 Stop trolling, zapatancas, calling me a liar m... ... 0.000000 0.000000 3 1 ==You're cool== You seem like a really cool g... ... 0.000000 0.000000 4 1 ::::: Why are you threatening me? I'm not bein... ... 0.000000 0.000000 .. ... ... ... ... ... 233 1 WHy are you ugly and fat? ... 0.000000 0.000000 234 1 ::Is that so? Than why so many people question... ... 0.000000 0.000000 235 1 Yep, he be the mouthpiece, but his law still s... ... 0.000000 0.000000 236 1 **And we have a winner for the douchiest comme... ... 0.316228 0.316228 237 0 harmony between people of this village, or may... ... 0.000000 0.000000

[238 rows x 3707 columns] Sentiment SentimentText ... features.onnx.3703 features.onnx.3704 0 1 ==RUDE== Dude, you are rude upload that carl p... ... 0.000000 0.000000 1 1 == OK! == IM GOING TO VANDALIZE WILD ONES WIK... ... 0.000000 0.000000 2 1 Stop trolling, zapatancas, calling me a liar m... ... 0.000000 0.000000 3 1 ==You're cool== You seem like a really cool g... ... 0.000000 0.000000 4 1 ::::: Why are you threatening me? I'm not bein... ... 0.000000 0.000000 .. ... ... ... ... ... 233 1 WHy are you ugly and fat? ... 0.000000 0.000000 234 1 ::Is that so? Than why so many people question... ... 0.000000 0.000000 235 1 Yep, he be the mouthpiece, but his law still s... ... 0.000000 0.000000 236 1 **And we have a winner for the douchiest comme... ... 0.316228 0.316228 237 0 harmony between people of this village, or may... ... 0.000000 0.000000

[238 rows x 3709 columns]

ganik avatar Feb 11 '20 15:02 ganik