NimbusML icon indicating copy to clipboard operation
NimbusML copied to clipboard

Error loading a model that was saved with mlnet auto-train

Open RokoToken opened this issue 5 years ago • 9 comments

Describe the bug When using the mlnet auto-train tool to create a model, and then load that model using NimbusML, an exception is being thrown.

To Reproduce Steps to reproduce the behavior:

  1. Run mlnet auto-train --dataset ... --task ... to create an ML.NET .zip model file.
  2. Using NimbusML, attempt to load that model file and score some data like the following:
dataset = FileDataStream.read_csv('TrainingData.csv')
pipeline = Pipeline()
pipeline.load_model("MLModel.zip")
scores = pipeline.predict(dataset, y='target', evaltype='binary')

Expected behavior Loading and scoring the model should work as expected.

Actual behavior You get an exception and scoring is not completed:

Error: *** System.ArgumentOutOfRangeException: 'Could not find label column 'PredictedLabel'
Parameter name: input'Traceback (most recent call last):
  File "nimbus.py", line 7, in <module>
    scores = pipeline.predict(test_df, evaltype='binary')
  File "C:\Users\eric\Omni\venv\lib\site-packages\nimbusml\internal\utils\utils.py", line 220, in wrapper
    params = func(*args, **kwargs)
  File "C:\Users\eric\venv\lib\site-packages\nimbusml\pipeline.py", line 2228, in predict
    as_binary_data_stream=as_binary_data_stream, **params)
  File "C:\Users\eric\venv\lib\site-packages\nimbusml\internal\utils\utils.py", line 220, in wrapper
    params = func(*args, **kwargs)
  File "C:\Users\eric\venv\lib\site-packages\nimbusml\pipeline.py", line 2172, in _predict
    raise e
  File "C:\Users\eric\venv\lib\site-packages\nimbusml\pipeline.py", line 2169, in _predict
    **params)
  File "C:\Users\eric\venv\lib\site-packages\nimbusml\internal\utils\entrypoints.py", line 449, in run
    output_predictor_modelfilename)
  File "C:\Users\eric\venv\lib\site-packages\nimbusml\internal\utils\entrypoints.py", line 306, in _try_call_bridge
    raise e
  File "C:\Users\eric\venv\lib\site-packages\nimbusml\internal\utils\entrypoints.py", line 278, in _try_call_bridge
    ret = px_call(call_parameters)
RuntimeError: Error: *** System.ArgumentOutOfRangeException: 'Could not find label column 'PredictedLabel'
Parameter name: input'

Desktop (please complete the following information):

OS: Windows
Browser N/A
Version 1.6.1

Additional Context

  • I attempted to solve this by adding an additional column named 'PredictedLabel' inside of 'TrainingData.csv' but it gave the same error

RokoToken avatar Jan 29 '20 03:01 RokoToken

@RokoToken thank you for reporting this. Could you share the model.zip and small subset of TrainingData.csv for us to repro this issue. thx

ganik avatar Jan 29 '20 04:01 ganik

Modified Titanic CSV Dataset

survived,sex,class,deck,embark_town,alone
TRUE,male,Third,unknown,Southampton,n
TRUE,female,First,C,Cherbourg,n
TRUE,female,Second,unknown,Southampton,y

MLNet CLI Command:

mlnet auto-train --task multiclass-classification --dataset "titanic.csv" --label-column-name "class" 

Nimbus Code:

from nimbusml import Pipeline, FileDataStream
dataset = FileDataStream.read_csv('titanic.csv')
pipeline = Pipeline()
pipeline.load_model("MLModel.zip")
scores = pipeline.predict(dataset, y='class', evaltype='binary')
print(scores)

Error:

Error: *** System.ArgumentOutOfRangeException: 'Could not find label column 'PredictedLabel'

RokoToken avatar Jan 29 '20 22:01 RokoToken

There was a similar issue 6mo ago -- https://github.com/microsoft/NimbusML/issues/201 -- We were fixing NimbusML scoring of models trained in the AutoML.NET CLI.

@RokoToken: Can you post your MLModel.zip? Also, which version of the CLI are you using? mlnet --version

justinormont avatar Jan 29 '20 23:01 justinormont

@justinormont @ganik mlnet version = 0.15.28007.4 @BuiltBy: dlab14-DDVSOWINAGE054 MLModel.zip

RokoToken avatar Jan 30 '20 00:01 RokoToken

Is there a workaround for this? Should I use an older version of MLNet CLI? Is there a way to modify the output column through the Nimbus pipeline? Something like:

from nimbusml import Pipeline, FileDataStream
dataset = FileDataStream.read_csv('titanic.csv')
pipeline = Pipeline( add_output_column=PredictedLabel )
pipeline.load_model("MLModel.zip")
scores = pipeline.predict(dataset, y='class', evaltype='binary')
print(scores)

RokoToken avatar Feb 03 '20 21:02 RokoToken

@RokoToken, the workaround will be to find the pipeline params from AutoML.NET and re-train same pipeline using either just ML.NET or NimbusML. Also can you try using pipeline.score(...)

ganik avatar Feb 03 '20 22:02 ganik

@ganik: Do you see anything odd with the posted model?

@RokoToken: I would expect that the AutoML․NET CLI is producing a normal ML․NET model. Your current version is the newest released version.

You can also re-train your model from the generated code which the CLI produced. You can uncomment the line ModelBuilder.CreateModel(), and run the project. You can also update the project requirements, as the codegen references an older version of ML․NET.

justinormont avatar Feb 04 '20 06:02 justinormont

@RokoToken sorry for delay, could you share pls titanic.csv file. The model does look ok, so it should work. thx

ganik avatar Feb 26 '20 00:02 ganik

I was able to debug through and get scoring after few fixes in NimbusML python code (not ML.NET). However return scores are NaN. Script: `from nimbusml import Pipeline, FileDataStream

dataset = FileDataStream.read_csv('E:/sources/tmp/titanic.csv') print(dataset.head(3))

pipeline = Pipeline() pipeline.load_model("E:/sources/tmp/MLModel.zip") scores = pipeline.predict(dataset) print(scores.head(3))`

and output: image

@justinormont Could you see if you can score this in ML.NET. I am not getting any scores from this model. I used this csv test file below: survived,sex,class,deck,embark_town,alone TRUE,male,Third,unknown,Southampton,n TRUE,female,First,C,Cherbourg,n TRUE,female,Second,unknown,Southampton,y

ganik avatar Feb 26 '20 23:02 ganik