NimbusML
NimbusML copied to clipboard
Onnx export of ColumnSelector doesn't drop input columns
This issue in NimbusML is pretty much the same issue I created on ML.NET https://github.com/dotnet/machinelearning/issues/4970 but wanted to create this one here just for the record.
Notice that the OnnxRunner correctly drops "case2" but not all the other input columns.
Code
import numpy
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.preprocessing import OnnxRunner
from nimbusml.preprocessing.schema import ColumnSelector, ColumnDuplicator
from data_frame_tool import DataFrameTool as DFT
path = get_dataset('infert').as_filepath()
dataset = FileDataStream.read_csv(path, sep=',',
numeric_dtype=numpy.float32,
names={0: 'row_num', 5: 'case'})
dataset = dataset.to_df()
pipeline = Pipeline([
ColumnDuplicator(columns={'case2': 'case'}),
ColumnSelector(columns=['age']),
])
print("\n\nML.NET RESULT")
result_expected = pipeline.fit_transform(dataset)
print(result_expected)
print("\n\nORT RESULT")
onnx_path = "C:\\Users\\anvelazq\Desktop\\is25colsel\\colsel.onnx"
pipeline.export_to_onnx(onnx_path, 'com.microsoft.ml')
onnxrunner = OnnxRunner(model_file=onnx_path)
result_onnx = onnxrunner.fit_transform(dataset)
print(result_onnx)
print("\n\nONNX RUNNER RESULT")
df_tool = DFT(onnx_path)
result_ort = df_tool.execute(dataset, [])
print(result_ort)
Output
ML.NET RESULT
age
0 26.0
1 42.0
2 39.0
3 34.0
4 35.0
.. ...
243 31.0
244 34.0
245 35.0
246 29.0
247 23.0
[248 rows x 1 columns]
ORT RESULT
row_num education age parity induced case spontaneous stratum pooled.stratum
0 1.0 0-5yrs 26.0 6.0 1.0 1.0 2.0 1.0 3.0
1 2.0 0-5yrs 42.0 1.0 1.0 1.0 0.0 2.0 1.0
2 3.0 0-5yrs 39.0 6.0 2.0 1.0 0.0 3.0 4.0
3 4.0 0-5yrs 34.0 4.0 2.0 1.0 0.0 4.0 2.0
4 5.0 6-11yrs 35.0 3.0 1.0 1.0 1.0 5.0 32.0
.. ... ... ... ... ... ... ... ... ...
243 244.0 12+ yrs 31.0 1.0 0.0 0.0 1.0 79.0 45.0
244 245.0 12+ yrs 34.0 1.0 0.0 0.0 0.0 80.0 47.0
245 246.0 12+ yrs 35.0 2.0 2.0 0.0 0.0 81.0 54.0
246 247.0 12+ yrs 29.0 1.0 0.0 0.0 1.0 82.0 43.0
247 248.0 12+ yrs 23.0 1.0 0.0 0.0 1.0 83.0 40.0
[248 rows x 9 columns]
ONNX RUNNER RESULT
age.output
0 26.0
1 42.0
2 39.0
3 34.0
4 35.0
.. ...
243 31.0
244 34.0
245 35.0
246 29.0
247 23.0
[248 rows x 1 columns]