NimbusML
NimbusML copied to clipboard
Error loading a model that was saved with mlnet auto-train
Describe the bug When using the mlnet auto-train tool to create a model, and then load that model using NimbusML, an exception is being thrown.
To Reproduce Steps to reproduce the behavior:
- Run mlnet auto-train --dataset ... --task ... to create an ML.NET .zip model file.
- Using NimbusML, attempt to load that model file and score some data like the following:
dataset = FileDataStream.read_csv('TrainingData.csv')
pipeline = Pipeline()
pipeline.load_model("MLModel.zip")
scores = pipeline.predict(dataset, y='target', evaltype='binary')
Expected behavior Loading and scoring the model should work as expected.
Actual behavior You get an exception and scoring is not completed:
Error: *** System.ArgumentOutOfRangeException: 'Could not find label column 'PredictedLabel'
Parameter name: input'Traceback (most recent call last):
File "nimbus.py", line 7, in <module>
scores = pipeline.predict(test_df, evaltype='binary')
File "C:\Users\eric\Omni\venv\lib\site-packages\nimbusml\internal\utils\utils.py", line 220, in wrapper
params = func(*args, **kwargs)
File "C:\Users\eric\venv\lib\site-packages\nimbusml\pipeline.py", line 2228, in predict
as_binary_data_stream=as_binary_data_stream, **params)
File "C:\Users\eric\venv\lib\site-packages\nimbusml\internal\utils\utils.py", line 220, in wrapper
params = func(*args, **kwargs)
File "C:\Users\eric\venv\lib\site-packages\nimbusml\pipeline.py", line 2172, in _predict
raise e
File "C:\Users\eric\venv\lib\site-packages\nimbusml\pipeline.py", line 2169, in _predict
**params)
File "C:\Users\eric\venv\lib\site-packages\nimbusml\internal\utils\entrypoints.py", line 449, in run
output_predictor_modelfilename)
File "C:\Users\eric\venv\lib\site-packages\nimbusml\internal\utils\entrypoints.py", line 306, in _try_call_bridge
raise e
File "C:\Users\eric\venv\lib\site-packages\nimbusml\internal\utils\entrypoints.py", line 278, in _try_call_bridge
ret = px_call(call_parameters)
RuntimeError: Error: *** System.ArgumentOutOfRangeException: 'Could not find label column 'PredictedLabel'
Parameter name: input'
Desktop (please complete the following information):
OS: Windows
Browser N/A
Version 1.6.1
Additional Context
- I attempted to solve this by adding an additional column named 'PredictedLabel' inside of 'TrainingData.csv' but it gave the same error
@RokoToken thank you for reporting this. Could you share the model.zip and small subset of TrainingData.csv for us to repro this issue. thx
Modified Titanic CSV Dataset
survived,sex,class,deck,embark_town,alone
TRUE,male,Third,unknown,Southampton,n
TRUE,female,First,C,Cherbourg,n
TRUE,female,Second,unknown,Southampton,y
MLNet CLI Command:
mlnet auto-train --task multiclass-classification --dataset "titanic.csv" --label-column-name "class"
Nimbus Code:
from nimbusml import Pipeline, FileDataStream
dataset = FileDataStream.read_csv('titanic.csv')
pipeline = Pipeline()
pipeline.load_model("MLModel.zip")
scores = pipeline.predict(dataset, y='class', evaltype='binary')
print(scores)
Error:
Error: *** System.ArgumentOutOfRangeException: 'Could not find label column 'PredictedLabel'
There was a similar issue 6mo ago -- https://github.com/microsoft/NimbusML/issues/201 -- We were fixing NimbusML scoring of models trained in the AutoML.NET CLI.
@RokoToken: Can you post your MLModel.zip? Also, which version of the CLI are you using? mlnet --version
@justinormont @ganik mlnet version = 0.15.28007.4 @BuiltBy: dlab14-DDVSOWINAGE054 MLModel.zip
Is there a workaround for this? Should I use an older version of MLNet CLI? Is there a way to modify the output column through the Nimbus pipeline? Something like:
from nimbusml import Pipeline, FileDataStream
dataset = FileDataStream.read_csv('titanic.csv')
pipeline = Pipeline( add_output_column=PredictedLabel )
pipeline.load_model("MLModel.zip")
scores = pipeline.predict(dataset, y='class', evaltype='binary')
print(scores)
@RokoToken, the workaround will be to find the pipeline params from AutoML.NET and re-train same pipeline using either just ML.NET or NimbusML. Also can you try using pipeline.score(...)
@ganik: Do you see anything odd with the posted model?
@RokoToken: I would expect that the AutoML․NET CLI is producing a normal ML․NET model. Your current version is the newest released version.
You can also re-train your model from the generated code which the CLI produced. You can uncomment the line ModelBuilder.CreateModel()
, and run the project. You can also update the project requirements, as the codegen references an older version of ML․NET.
@RokoToken sorry for delay, could you share pls titanic.csv file. The model does look ok, so it should work. thx
I was able to debug through and get scoring after few fixes in NimbusML python code (not ML.NET). However return scores are NaN. Script: `from nimbusml import Pipeline, FileDataStream
dataset = FileDataStream.read_csv('E:/sources/tmp/titanic.csv') print(dataset.head(3))
pipeline = Pipeline() pipeline.load_model("E:/sources/tmp/MLModel.zip") scores = pipeline.predict(dataset) print(scores.head(3))`
and output:
@justinormont Could you see if you can score this in ML.NET. I am not getting any scores from this model.
I used this csv test file below:
survived,sex,class,deck,embark_town,alone TRUE,male,Third,unknown,Southampton,n TRUE,female,First,C,Cherbourg,n TRUE,female,Second,unknown,Southampton,y