Gani Nazirov
Gani Nazirov
Changes needed to convert DeBerta to ONNX
Repro `from nimbusml.datasets import get_dataset from nimbusml.preprocessing import OnnxRunner, ToKey iris_df = get_dataset("iris").as_df() iris_df = iris_df.drop(['Label'], axis=1) transform = ToKey()
Currently if you DatasetTransformer with predictor model it outputs all the hidden fields. It needs to ouput only Score and optionally PredictedLabel if its classifier for ex, Probabilities if available.
New transforms added: LagLeadOperator SimpleRollingWindow AnalyticalRollingWindow ShortDrop ForecastingPivot
Currently TreeFeaturizer is generated as a python user class, but it doesnt work, also no samples/tests. The changes to support it would require supporting PredictorModel class in GraphRunner parsing logic...
Repro ` from nimbusml.datasets import get_dataset from nimbusml import FileDataStream from nimbusml.preprocessing import OnnxRunner from nimbusml.feature_extraction.text import NGramFeaturizer from nimbusml.feature_extraction.text.extractor import Ngram path = get_dataset("wiki_detox_train").as_filepath() data = FileDataStream.read_csv(path, sep='\t') transformer...
Repro: r0 = Pipeline([MinMaxScaler()]) r0.fit(train_df) r1 = Pipeline([DatasetTransformer(r0.model)]) r1.fit_transform(train_df)
Currently pandas dataframe with N columns of the same type will be loaded into IDataView with N columns. As opposed to csr_matrix which will be loaded as IDataView with 1...