NimbusML
NimbusML copied to clipboard
Loading a classifier model from disk does not preserve column dtype when calling test()
When a model is loaded from disk, the transforms_predictedlabelcolumnoriginalvalueconverter node is not added to the pipeline which causes the output dtype of the PredictedLabel
column to be int32
rather than the expected int64
.
Add the following test to the end of src\python\nimbusml\tests\pipeline\test_load_save.py
to see the issue:
def test_saving_loading_pipeline_model_does_not_change_dtype(self):
model_nimbusml = Pipeline(
steps=[
('cat',
OneHotVectorizer() << categorical_columns),
('linear',
FastLinearBinaryClassifier(
shuffle=False,
number_of_threads=1))])
model_nimbusml.fit(train, label)
metrics, score = model_nimbusml.test(test, test_label, output_scores=True)
model_nimbusml.save_model('model.nimbusml.m')
model_nimbusml_load = Pipeline()
model_nimbusml_load.load_model('model.nimbusml.m')
metrics2, score2 = model_nimbusml_load.test(test,
test_label,
output_scores=True,
evaltype="binary")
self.assertEqual(score.dtypes[0].name,
score2.dtypes[0].name)
os.remove('model.nimbusml.m')
This is an issue with any classifier because the first part of the following if
statement is skipped when the model is loaded from disk (aka. steps is undefined or empty).
def _predict(self, X, y=None,
...
if hasattr(self, 'steps') and len(self.steps) > 0 \
and self.last_node.type == 'classifier':
select_node = transforms_scorecolumnselector(
data="$scoredVectorData",
output_data="$scoreColumnsOnlyData", score_column="Score")
convert_label_node = \
transforms_predictedlabelcolumnoriginalvalueconverter(
data="$scoreColumnsOnlyData",
predicted_label_column="PredictedLabel",
output_data="$output_data")
all_nodes.extend([select_node, convert_label_node])
else:
select_node = transforms_scorecolumnselector(
data="$scoredVectorData",
output_data="$output_data", score_column="Score")
all_nodes.extend([select_node])