machinelearning [AutoML] Can't manually fit AutoML pipeline after training

System Information (please complete the following information):

OS & Version: Windows 11
ML.NET Version: Microsoft.ML.AutoML 0.20.0-preview.22226.2
.NET Version: .NET 6.0

Describe the bug

I've configured and run an AutoML experiment with the following pipeline:

var pipeline = 
	mlContext.Transforms.Categorical.OneHotEncoding(new[] { new InputOutputColumnPair(@"vendor_id", @"vendor_id"), new InputOutputColumnPair(@"payment_type", @"payment_type")},outputKind: OutputKind.Binary)
            .Append(mlContext.Transforms.ReplaceMissingValues(new[] { new InputOutputColumnPair(@"rate_code", @"rate_code"), new InputOutputColumnPair(@"passenger_count", @"passenger_count"), new InputOutputColumnPair(@"trip_time_in_secs", @"trip_time_in_secs"), new InputOutputColumnPair(@"trip_distance", @"trip_distance") }))
            .Append(mlContext.Transforms.Concatenate(@"Features", new[] { @"vendor_id", @"payment_type", @"rate_code", @"passenger_count", @"trip_time_in_secs", @"trip_distance" }))
            .Append(mlContext.Auto().Regression(labelColumnName: "fare_amount"));

var experiment = 
	mlContext.Auto().CreateExperiment()
	    .SetPipeline(pipeline)
            .SetTrainingTimeInSeconds(60)
            .SetDataset(trainSet, validationSet)
            .SetEvaluateMetric(RegressionMetric.RSquared, "fare_amount", "Score");

Training is successful. I then use the BuildPipeline method to get the transforms and trainer used by AutoML to train my model. When I try to call Fit to fit the pipeline to my data, I get the following error:

System.OperationCanceledException: Operation was canceled.
   at Microsoft.ML.Runtime.Contracts.CheckAlive(IHostEnvironment env)
   at Microsoft.ML.Transforms.ValueToKeyMappingTransformer.Train(IHostEnvironment env, IChannel ch, ColInfo[] infos, IDataView keyData, ColumnOptionsBase[] columns, IDataView trainingData, Boolean autoConvert)
   at Microsoft.ML.Transforms.ValueToKeyMappingTransformer..ctor(IHostEnvironment env, IDataView input, ColumnOptionsBase[] columns, IDataView keyData, Boolean autoConvert)
   at Microsoft.ML.Transforms.ValueToKeyMappingEstimator.Fit(IDataView input)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Microsoft.ML.Transforms.OneHotEncodingTransformer..ctor(ValueToKeyMappingEstimator term, IEstimator`1 toVector, IDataView input)
   at Microsoft.ML.Transforms.OneHotEncodingEstimator.Fit(IDataView input)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Submission#32.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

However, when I create a new instance of MLContext and manually recreate the pipeline (replacing the AutoML sweeping estimator with the actual trainer) the model trains successfully.

Expected result

Model trains successfully.

May 29 '22 17:05 luisquintanilla

You probably need to use a new context when re-constructing pipeline with best parameters

var anotherContext = new MLContext();
var autoMLPipeline = result.TrialSettings.Pipeline.BuildTrainingPipeline(anotherContext, result.TrialSettings.Pipeline.Parameter)

This is because AutoMLExperiment calls context.CancelExection after the experiment is done in order to cancel all running trials. This will make context.CheckAlive failure in the next training performed on that context

@michaelgsharp is there a specific reason to put context as dead state after context.CancelExecution get called? I feel like it might be unnecessary as cancelling a running trial would be rather common in notebook use case. And it's better to just reuse that context even it get cancelled.

Jun 06 '22 22:06 LittleLittleCloud

@LittleLittleCloud I'm not sure. We would have to investigate and see if there is a better way of doing that. Essentially we would need a way to "reset" the cancellation token (or provide a new one/etc). What are your guys thoughts on this?

Jun 13 '22 17:06 michaelgsharp

@LittleLittleCloud @luisquintanilla any further thoughts on this? Or should we close it for now?

Oct 10 '22 19:10 michaelgsharp

@michaelgsharp you can close this as resolved. Currently a new context will be created for each trial, so this problem is no longer exist.

Oct 10 '22 19:10 LittleLittleCloud

machinelearning machinelearning copied to clipboard

[AutoML] Can't manually fit AutoML pipeline after training

machinelearning
machinelearning copied to clipboard