machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Permutation Feature Importance for MulticlassClassification still seems to have issues

Open acrigney opened this issue 3 years ago • 16 comments
trafficstars

System Information (please complete the following information):

  • OS & Version: [e.g. Windows 10]
  • ML.NET Version: [e.g. ML.NET v1.5.5]
  • .NET Version: [e.g. .NET 5.0]

Describe the bug A clear and concise description of what the bug is. I see that new code for a new API for this has been released with the PermutationFeatureImportanceExtensions.cs extension

https://github.com/dotnet/machinelearning/blob/main/src/Microsoft.ML.Transforms/PermutationFeatureImportanceExtensions.cs

But it uses MulticlassClassificationCatalog which is an internal class. So I am trying to get the old way to work. But I have an error getting incompatible feature column types. Here is an example.

'Incompatible features column type: 'Vector<Single, 2>' vs 'Vector<Single, 4>''

Here is some example code, I get the same type of error with my actual code.

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots, Code, Sample Projects If applicable, add scree PFI.txt nshots, code snippets, or sample projects to help explain your problem.

Additional context Add any other context about the problem here.

acrigney avatar Aug 12 '22 06:08 acrigney

It looks to me like MulticlassClassificationCatalog is not internal: https://source.dot.net/#Microsoft.ML.Data/TrainCatalog.cs,cd7e47788ba38baa. It is sealed, but that only prevents you from inheriting from it. Can you clarify what exactly the issue was?

dakersnar avatar Aug 22 '22 20:08 dakersnar

This issue has been marked needs-author-action and may be missing some important information.

ghost avatar Aug 22 '22 20:08 ghost

Sorry guys yes the MulticlassClassificationCatalog is not internal but I could not find how to use it. But its actually on the MLContext!

So in my model generator class has this MultiClassPermutationFeatureImportance property

public partial class MLModelGenerator { public ImmutableDictionary<string, MulticlassClassificationMetricsStatistics> MultiClassPermutationFeatureImportance { get; private set; }

    public void GetMulticlassificationPermutationFeatureImportance()
    {
        MultiClassPermutationFeatureImportance = _mlContext.MulticlassClassification.PermutationFeatureImportance(_saveModel, _trainDataView, labelColumnName: "Label");

        var featureImportanceMetrics =
            MultiClassPermutationFeatureImportance
         .Select((metric, index) => new { index, metric })
         .OrderByDescending(myFeatures => myFeatures.metric.Value);
        
        foreach (var feature in featureImportanceMetrics)
        {
            Debug.WriteLine($"{_selectedFeatureNames[feature.index],-20}|\t{feature.metric.Value:F6}");
        }

...

But the error I get now is that I have incompatible colum types with my data view and with my model Incompatible features column type: 'Vector<Single, 54>' vs 'Vector<Single, 6484>'

I have a lot of multi dimensional vectors.

acrigney avatar Aug 29 '22 07:08 acrigney

@acrigney which line specifically throws that error? Any chance you could provide a repro for me to look into deeper?

dakersnar avatar Sep 01 '22 23:09 dakersnar

Mate the repo to test this has always been in the issue here and in my support request to Microsoft. Its at the start of the conversation PFI.txt

acrigney avatar Sep 05 '22 00:09 acrigney

No update on this after 2 weeks, I thought you guys cared?

acrigney avatar Sep 19 '22 01:09 acrigney

@acrigney Huge apologies, there was a miscommunication when routing this issue to the appropriate person. We will have a follow up for you shortly.

dakersnar avatar Sep 19 '22 14:09 dakersnar

@acrigney taking a look now.

luisquintanilla avatar Sep 19 '22 14:09 luisquintanilla

@acrigney as @dakersnar mentioned, would you be able to provide the code to repro this issue? The PFI.txt file does not allow us to repro and investigate what your issue is. I see you're using a "Label" column in your call to Multiclass PFI, but in the PFI.txt file, there is no "Label" column and the output from applying the ProjectToPrincipalComponents does not produce a "Label" column. Additionally, ProjectToPrincipalComponents is not part of the multiclass catalog. So to be able to repro and investigate, we'd need code similar to the one you're running.

luisquintanilla avatar Sep 19 '22 15:09 luisquintanilla

Sorry guys have to get onto this later, I had meetings on other stuff today and some for the rest of the day.

acrigney avatar Sep 20 '22 03:09 acrigney

Guys I added the Label in but it still gives the error on the call to var permutationMetrics = mlContext.MulticlassClassification .PermutationFeatureImportance(linearPredictor, transformedData, labelColumnName: "Label", permutationCount: 30);

Gets error Incompatible features column type: 'Vector<Single, 2>' vs 'Vector<Single, 4>'

using Microsoft.ML; using Microsoft.ML.AutoML; using Microsoft.ML.Data; using Microsoft.ML.Trainers; using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; using System.Text; using System.Threading; using System.Threading.Tasks;

namespace PFTTest { internal class Program { private class Data { public float Label { get; set; }

        public float Feature1 { get; set; }

        public float Feature2 { get; set; }
    }

    /// <summary>
    /// Generate an enumerable of Data objects, creating the label as a simple
    /// linear combination of the features.
    /// </summary>
    /// <param name="nExamples">The number of examples.</param>
    /// <param name="bias">The bias, or offset, in the calculation of the
    /// label.</param>
    /// <param name="weight1">The weight to multiply the first feature with to
    /// compute the label.</param>
    /// <param name="weight2">The weight to multiply the second feature with to
    /// compute the label.</param>
    /// <param name="seed">The seed for generating feature values and label
    /// noise.</param>
    /// <returns>An enumerable of Data objects.</returns>
    private static IEnumerable<Data> GenerateData(int nExamples = 10000,
        double bias = 0, double weight1 = 1, double weight2 = 2, int seed = 1)
    {
        var rng = new Random(seed);
        var max = bias + 4.5 * weight1 + 4.5 * weight2 + 0.5;
        for (int i = 0; i < nExamples; i++)
        {
            var data = new Data
            {
                Feature1 = (float)(rng.Next(10) * (rng.NextDouble() - 0.5)),
                Feature2 = (float)(rng.Next(10) * (rng.NextDouble() - 0.5)),
            };

            // Create a noisy label.
            var value = (float)
                (bias + weight1 * data.Feature1 + weight2 * data.Feature2 +
                rng.NextDouble() - 0.5);

            if (value < max / 3)
                data.Label = 0;
            else if (value < 2 * max / 3)
                data.Label = 1;
            else
                data.Label = 2;
            yield return data;
        }
    }
    static void Main(string[] args)
    {
        // Create a new context for ML.NET operations. It can be used for
        // exception tracking and logging, as a catalog of available operations
        // and as the source of randomness.
        var mlContext = new MLContext(seed: 1);

        // Create sample data.
        var samples = GenerateData();

        // Load the sample data as an IDataView.
        var data = mlContext.Data.LoadFromEnumerable(samples);

        // Define a training pipeline that concatenates features into a vector,
        // normalizes them, and then trains a linear model.
        var featureColumns =
            new string[] { nameof(Data.Feature1), nameof(Data.Feature2) };

        var pipeline = mlContext.Transforms
            .Concatenate("Features", featureColumns)
            .Append(mlContext.Transforms.Conversion.MapValueToKey("Label"))
            .Append(mlContext.Transforms.NormalizeMinMax("Features"));

        //var pipeline = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "Label", inputColumnName: labelColumnName).
        //                  Append(mlContext.Transforms.Concatenate("Features", _features)).AppendCacheCheckpoint(mlContext);




        // Fit the pipeline to the data.
        var model = pipeline.Fit(data);

        // Transform the dataset.
        var transformedData = model.Transform(data);

        var sch = transformedData.Schema;

        ExperimentResult<MulticlassClassificationMetrics> experimentResult = mlContext.Auto()
       .CreateMulticlassClassificationExperiment(60)
       .Execute(transformedData, "Label", progressHandler: new MulticlassExperimentProgressHandler());

        var concatenatorTransformer = pipeline.Fit(data);
        var bestModel = concatenatorTransformer.Append(experimentResult.BestRun.Model);

        var lastTransformer = ((TransformerChain<ITransformer>)bestModel).LastTransformer;

        var linearPredictor = (bestModel as TransformerChain<ITransformer>).LastTransformer;
        //var linearPredictor = (bestModel as TransformerChain<ITransformer>).LastTransformer as MulticlassPredictionTransformer<MaximumEntropyModelParameters>;

        //var linearPredictor = (ISingleFeaturePredictionTransformer<object>)lastTransformer;

        var msch = experimentResult.BestRun.Model.GetOutputSchema(sch);
        //var xx = linearPredictor.Preview(data, 4);

        // Compute the permutation metrics for the linear model using the
        // normalized data.
        **var permutationMetrics = mlContext.MulticlassClassification
            .PermutationFeatureImportance(linearPredictor, transformedData, labelColumnName: "Label",
            permutationCount: 30);**

        // Gets error Incompatible features column type: 'Vector<Single, 2>' vs 'Vector<Single, 4>'

        // Now let's look at which features are most important to the model
        // overall. Get the feature indices sorted by their impact on
        // microaccuracy.
        //var sortedIndices = permutationMetrics
        //    .Select((metrics, index) => new { index, metrics.MicroAccuracy })
        //    .OrderByDescending(feature => Math.Abs(feature.MicroAccuracy.Mean))
        //    .Select(feature => feature.index);

        //Console.WriteLine("Feature\tChange in MicroAccuracy\t95% Confidence in "
        //    + "the Mean Change in MicroAccuracy");

        //var microAccuracy = permutationMetrics.Select(x => x.MicroAccuracy)
        //    .ToArray();

        //foreach (int i in sortedIndices)
        //{
        //    Console.WriteLine("{0}\t{1:G4}\t{2:G4}",
        //        featureColumns[i],
        //        microAccuracy[i].Mean,
        //        1.96 * microAccuracy[i].StandardError);
        //}
    }

    //public class CustomMulticlassExperimentProgressHandler<T> : IProgress<RunDetail<RegressionMetrics>>
    //{
    //    //private static NLog.Logger _logger = NLog.LogManager.GetCurrentClassLogger();
    //    private CancellationTokenSource _convergenceCancellationToken;
    //    private ModelInput<T> _modelInput;
    //    public int? MaxIterationIndex { get; set; }
    //    public bool UseConvergenceLimit { get; set; }
    //    public int IterationIndex { get; private set; }
    //    public double ConvergenceRsquaredReached { get; private set; }
    //    public double? MaxConvergenceBeforeMinLimitReached { get; private set; }
    //    public double ConvergenceLimit { get; set; }
    //    public uint? MinExperimentTimeInSeconds { get; private set; }

    //    public CustomMulticlassExperimentProgressHandler(CancellationTokenSource convergenceCancellationToken, ModelInput<T> modelInput)
    //    {
    //        _modelInput = modelInput;
    //        _convergenceCancellationToken = convergenceCancellationToken;
    //        UseConvergenceLimit = modelInput.UseConvergenceLimit ?? false;
    //        ConvergenceLimit = modelInput.ConvergenceLimit ?? 0.99;

    //        MinExperimentTimeInSeconds = modelInput.MinExperimentTimeInSeconds;
    //        MaxConvergenceBeforeMinLimitReached = null;
    //    }

    //    public void Report(RunDetail<MulticlassClassificationMetrics> iterationResult)
    //    {
    //        if (IterationIndex++ == 0)
    //        {
    //            ConsoleHelper.PrintMulticlassClassificationMetricsHeader();
    //        }

    //        if (iterationResult.Exception != null)
    //        {
    //            ConsoleHelper.PrintIterationException(iterationResult.Exception);
    //        }
    //        else
    //        {
    //            ConsoleHelper.PrintIterationMetrics(IterationIndex, iterationResult.TrainerName,
    //                iterationResult.ValidationMetrics, iterationResult.RuntimeInSeconds);
    //        }

    //        if ((iterationResult.ValidationMetrics != null) &&
    //            (!double.IsNaN(iterationResult.ValidationMetrics.MicroAccuracy)))
    //        {
    //            if (UseConvergenceLimit)
    //            {
    //                ConvergenceRsquaredReached = iterationResult.ValidationMetrics.MicroAccuracy;

    //                if (ConvergenceRsquaredReached > ConvergenceLimit)
    //                {
    //                    if (MinExperimentTimeInSeconds.HasValue)
    //                    {
    //                        if (iterationResult.RuntimeInSeconds > MinExperimentTimeInSeconds)
    //                        {
    //                            Debug.WriteLine("{0} model converged at specified {1} level after {2} secs.",
    //                                _modelInput.ModelName, ConvergenceLimit, MinExperimentTimeInSeconds);
    //                            //_logger.Info("{0} model converged at specified {1} level after {2} secs.",
    //                            //    _dataSource, ConvergenceLimit, MinExperimentTimeInSeconds);
    //                            _convergenceCancellationToken.Cancel();
    //                        }
    //                        else
    //                        {
    //                            if (MaxConvergenceBeforeMinLimitReached == null || (ConvergenceRsquaredReached > MaxConvergenceBeforeMinLimitReached))
    //                            {
    //                                MaxConvergenceBeforeMinLimitReached = ConvergenceRsquaredReached;
    //                            }
    //                            else
    //                            {
    //                                Debug.WriteLine("{0} model diverged at level {1} from max {2} level so reconvergence is required.",
    //                                    _modelInput.ModelName, ConvergenceRsquaredReached, MaxConvergenceBeforeMinLimitReached);
    //                                //_logger.Info("{0} model diverged at level {1} from max {2} level so reconvergence is required.",
    //                                //    _dataSource, ConvergenceRsquaredReached, MaxConvergenceBeforeMinLimitReached);
    //                                _convergenceCancellationToken.Cancel();
    //                            }
    //                        }
    //                    }
    //                    else
    //                    {
    //                        Debug.WriteLine("{0} model converged at {1} level.", _modelInput.ModelName, ConvergenceRsquaredReached);
    //                        //_logger.Info("{0} model converged at {1} level.", _dataSource, ConvergenceRsquaredReached);
    //                        _convergenceCancellationToken.Cancel();
    //                    }
    //                }
    //                else
    //                {
    //                    if (MaxConvergenceBeforeMinLimitReached.HasValue)
    //                    {
    //                        Debug.WriteLine("{0} model diverged so we need to re-converge at the max {1} level.",
    //                            _modelInput.ModelName, MaxConvergenceBeforeMinLimitReached);
    //                        //_logger.Info("{0} model diverged so we need to re-converge at the max {1} level.",
    //                        //    _dataSource, MaxConvergenceBeforeMinLimitReached);
    //                        _convergenceCancellationToken.Cancel();
    //                    }
    //                }
    //            }
    //            else if (MaxIterationIndex.HasValue)
    //            {
    //                if (IterationIndex == MaxIterationIndex)
    //                {
    //                    ConvergenceRsquaredReached = iterationResult.ValidationMetrics.MicroAccuracy;
    //                    MaxConvergenceBeforeMinLimitReached = ConvergenceRsquaredReached;
    //                    Debug.WriteLine("{0} model re-converged at the max {1} level at Iternation {2}.",
    //                        _modelInput.ModelName, MaxConvergenceBeforeMinLimitReached, MaxIterationIndex);
    //                    //_logger.Info("{0} model re-converged at the max {1} level at Iternation {2}.",
    //                    //    _dataSource, MaxConvergenceBeforeMinLimitReached, MaxIterationIndex);
    //                    _convergenceCancellationToken.Cancel();
    //                }
    //            }

    //        }

    //        //iterationResult.ValidationMetrics.
    //    }
    //}

    public class MulticlassExperimentProgressHandler : IProgress<RunDetail<MulticlassClassificationMetrics>>
    {
        private int _iterationIndex;

        public void Report(RunDetail<MulticlassClassificationMetrics> iterationResult)
        {
            if (_iterationIndex++ == 0)
            {
                ConsoleHelper.PrintMulticlassClassificationMetricsHeader();
            }

            if (iterationResult.Exception != null)
            {
                ConsoleHelper.PrintIterationException(iterationResult.Exception);
            }
            else
            {
                ConsoleHelper.PrintIterationMetrics(_iterationIndex, iterationResult.TrainerName,
                    iterationResult.ValidationMetrics, iterationResult.RuntimeInSeconds);
            }
        }
    }
}

}

acrigney avatar Sep 20 '22 03:09 acrigney

Thanks @acrigney. I'll check out this code and let you know if there's anything else I need.

luisquintanilla avatar Sep 20 '22 18:09 luisquintanilla

Hi @acrigney

Here is a working sample based on the code you provided. I made a few tweaks based on updates we've made to AutoML and PFI.

  1. AutoML pipeline / experiment. The AutoML API used in the pipeline is our new implementation and the one I recommend using going forward.
  2. PFI. I'm using an updated version of PFI which makes it simpler to use.

To get the latest version of the AutoML API, you can install the latest preview version of Microsoft.ML.AutoML from NuGet.

Hope this helps fix the issues you were running into. Let us know if we're okay to close this issue.

luisquintanilla avatar Sep 21 '22 04:09 luisquintanilla

This issue has been marked needs-author-action and may be missing some important information.

ghost avatar Sep 21 '22 20:09 ghost

Awesome mate! this code worked, thanks heaps! I updated the references to the latest preview ones i.e 2.0.0-preview.22313.1 Although I had to copy the lib_lightgbm.dll and lightgbm.exe into the x64/debug folder as they were not included but I have them from other projects. Maybe you know why that was? Also why do we have to do the // Transform the dataset. var transformedData = model.Transform(data);

Also I am not a fan of the notebook style, I really don't like how people want to use .net with notebooks, they are so clunky and we as developers write unit tests not pieces of code.

using Microsoft.ML; using Microsoft.ML.AutoML; using Microsoft.ML.Data; using Microsoft.ML.Trainers; using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; using System.Text; using System.Threading; using System.Threading.Tasks;

namespace PFTTest { internal class Program { private class Data { public float Label { get; set; }

        public float Feature1 { get; set; }

        public float Feature2 { get; set; }
    }

    /// <summary>
    /// Generate an enumerable of Data objects, creating the label as a simple
    /// linear combination of the features.
    /// </summary>
    /// <param name="nExamples">The number of examples.</param>
    /// <param name="bias">The bias, or offset, in the calculation of the
    /// label.</param>
    /// <param name="weight1">The weight to multiply the first feature with to
    /// compute the label.</param>
    /// <param name="weight2">The weight to multiply the second feature with to
    /// compute the label.</param>
    /// <param name="seed">The seed for generating feature values and label
    /// noise.</param>
    /// <returns>An enumerable of Data objects.</returns>
    private static IEnumerable<Data> GenerateData(int nExamples = 10000,
        double bias = 0, double weight1 = 1, double weight2 = 2, int seed = 1)
    {
        var rng = new Random(seed);
        var max = bias + 4.5 * weight1 + 4.5 * weight2 + 0.5;
        for (int i = 0; i < nExamples; i++)
        {
            var data = new Data
            {
                Feature1 = (float)(rng.Next(10) * (rng.NextDouble() - 0.5)),
                Feature2 = (float)(rng.Next(10) * (rng.NextDouble() - 0.5)),
            };

            // Create a noisy label.
            var value = (float)
                (bias + weight1 * data.Feature1 + weight2 * data.Feature2 +
                rng.NextDouble() - 0.5);

            if (value < max / 3)
                data.Label = 0;
            else if (value < 2 * max / 3)
                data.Label = 1;
            else
                data.Label = 2;
            yield return data;
        }
    }
    static void Main(string[] args)
    {
        // Create a new context for ML.NET operations. It can be used for
        // exception tracking and logging, as a catalog of available operations
        // and as the source of randomness.
        var mlContext = new MLContext(seed: 1);

        // Create sample data.
        var samples = GenerateData();

        // Load the sample data as an IDataView.
        var data = mlContext.Data.LoadFromEnumerable(samples);

        // Define a training pipeline that concatenates features into a vector,
        // normalizes them, and then trains a linear model.
        var featureColumns =
            new string[] { nameof(Data.Feature1), nameof(Data.Feature2) };

        var pipeline = mlContext.Transforms
            .Concatenate("Features", featureColumns)
            .Append(mlContext.Transforms.Conversion.MapValueToKey("Label"))
            .Append(mlContext.Transforms.NormalizeMinMax("Features"));

        //var pipeline = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "Label", inputColumnName: labelColumnName).
        //                  Append(mlContext.Transforms.Concatenate("Features", _features)).AppendCacheCheckpoint(mlContext);

        // Fit the pipeline to the data.
        var model = pipeline.Fit(data);

        // Transform the dataset.
        var transformedData = model.Transform(data);

        var autoMLPipeline = new MultiModelPipeline().Append(mlContext.Auto().MultiClassification());

        var sch = transformedData.Schema;

       // ExperimentResult<MulticlassClassificationMetrics> experimentResult = mlContext.Auto()
       //.CreateMulticlassClassificationExperiment(60)
       //.Execute(transformedData, "Label", progressHandler: new MulticlassExperimentProgressHandler());

        var experiment =
            mlContext.Auto().CreateExperiment()
                .SetPipeline(autoMLPipeline)
                .SetEvaluateMetric(MulticlassClassificationMetric.MicroAccuracy, labelColumn: "Label")
                .SetTrainingTimeInSeconds(60)
                .SetDataset(transformedData, fold: 5);

        var expResult = experiment.Run();

        var bestModel = expResult.Model;
        //var concatenatorTransformer = pipeline.Fit(data);
        //var bestModel = concatenatorTransformer.Append(experimentResult.BestRun.Model);

        //var lastTransformer = ((TransformerChain<ITransformer>)bestModel).LastTransformer;

        //var linearPredictor = (bestModel as TransformerChain<ITransformer>).LastTransformer;
        ////var linearPredictor = (bestModel as TransformerChain<ITransformer>).LastTransformer as MulticlassPredictionTransformer<MaximumEntropyModelParameters>;

        ////var linearPredictor = (ISingleFeaturePredictionTransformer<object>)lastTransformer;

        //var msch = experimentResult.BestRun.Model.GetOutputSchema(sch);
        //var xx = linearPredictor.Preview(data, 4);

        // Compute the permutation metrics for the linear model using the
        // normalized data.
        //var permutationMetrics = mlContext.MulticlassClassification
        //    .PermutationFeatureImportance(linearPredictor, transformedData, labelColumnName: "Label",
        //    permutationCount: 30);

        var pfi =  mlContext.MulticlassClassification.PermutationFeatureImportance(bestModel, transformedData, permutationCount: 3);

        var results = pfi.Select(x => Tuple.Create(x.Key, x.Value.MicroAccuracy.Mean))
                .OrderByDescending(x => x.Item2);
        // Gets error Incompatible features column type: 'Vector<Single, 2>' vs 'Vector<Single, 4>'

        // Now let's look at which features are most important to the model
        // overall. Get the feature indices sorted by their impact on
        // microaccuracy.
        //var sortedIndices = permutationMetrics
        //    .Select((metrics, index) => new { index, metrics.MicroAccuracy })
        //    .OrderByDescending(feature => Math.Abs(feature.MicroAccuracy.Mean))
        //    .Select(feature => feature.index);

        //Console.WriteLine("Feature\tChange in MicroAccuracy\t95% Confidence in "
        //    + "the Mean Change in MicroAccuracy");

        //var microAccuracy = permutationMetrics.Select(x => x.MicroAccuracy)
        //    .ToArray();

        //foreach (int i in sortedIndices)
        //{
        //    Console.WriteLine("{0}\t{1:G4}\t{2:G4}",
        //        featureColumns[i],
        //        microAccuracy[i].Mean,
        //        1.96 * microAccuracy[i].StandardError);
        //}
    }

    //public class CustomMulticlassExperimentProgressHandler<T> : IProgress<RunDetail<RegressionMetrics>>
    //{
    //    //private static NLog.Logger _logger = NLog.LogManager.GetCurrentClassLogger();
    //    private CancellationTokenSource _convergenceCancellationToken;
    //    private ModelInput<T> _modelInput;
    //    public int? MaxIterationIndex { get; set; }
    //    public bool UseConvergenceLimit { get; set; }
    //    public int IterationIndex { get; private set; }
    //    public double ConvergenceRsquaredReached { get; private set; }
    //    public double? MaxConvergenceBeforeMinLimitReached { get; private set; }
    //    public double ConvergenceLimit { get; set; }
    //    public uint? MinExperimentTimeInSeconds { get; private set; }

    //    public CustomMulticlassExperimentProgressHandler(CancellationTokenSource convergenceCancellationToken, ModelInput<T> modelInput)
    //    {
    //        _modelInput = modelInput;
    //        _convergenceCancellationToken = convergenceCancellationToken;
    //        UseConvergenceLimit = modelInput.UseConvergenceLimit ?? false;
    //        ConvergenceLimit = modelInput.ConvergenceLimit ?? 0.99;

    //        MinExperimentTimeInSeconds = modelInput.MinExperimentTimeInSeconds;
    //        MaxConvergenceBeforeMinLimitReached = null;
    //    }

    //    public void Report(RunDetail<MulticlassClassificationMetrics> iterationResult)
    //    {
    //        if (IterationIndex++ == 0)
    //        {
    //            ConsoleHelper.PrintMulticlassClassificationMetricsHeader();
    //        }

    //        if (iterationResult.Exception != null)
    //        {
    //            ConsoleHelper.PrintIterationException(iterationResult.Exception);
    //        }
    //        else
    //        {
    //            ConsoleHelper.PrintIterationMetrics(IterationIndex, iterationResult.TrainerName,
    //                iterationResult.ValidationMetrics, iterationResult.RuntimeInSeconds);
    //        }

    //        if ((iterationResult.ValidationMetrics != null) &&
    //            (!double.IsNaN(iterationResult.ValidationMetrics.MicroAccuracy)))
    //        {
    //            if (UseConvergenceLimit)
    //            {
    //                ConvergenceRsquaredReached = iterationResult.ValidationMetrics.MicroAccuracy;

    //                if (ConvergenceRsquaredReached > ConvergenceLimit)
    //                {
    //                    if (MinExperimentTimeInSeconds.HasValue)
    //                    {
    //                        if (iterationResult.RuntimeInSeconds > MinExperimentTimeInSeconds)
    //                        {
    //                            Debug.WriteLine("{0} model converged at specified {1} level after {2} secs.",
    //                                _modelInput.ModelName, ConvergenceLimit, MinExperimentTimeInSeconds);
    //                            //_logger.Info("{0} model converged at specified {1} level after {2} secs.",
    //                            //    _dataSource, ConvergenceLimit, MinExperimentTimeInSeconds);
    //                            _convergenceCancellationToken.Cancel();
    //                        }
    //                        else
    //                        {
    //                            if (MaxConvergenceBeforeMinLimitReached == null || (ConvergenceRsquaredReached > MaxConvergenceBeforeMinLimitReached))
    //                            {
    //                                MaxConvergenceBeforeMinLimitReached = ConvergenceRsquaredReached;
    //                            }
    //                            else
    //                            {
    //                                Debug.WriteLine("{0} model diverged at level {1} from max {2} level so reconvergence is required.",
    //                                    _modelInput.ModelName, ConvergenceRsquaredReached, MaxConvergenceBeforeMinLimitReached);
    //                                //_logger.Info("{0} model diverged at level {1} from max {2} level so reconvergence is required.",
    //                                //    _dataSource, ConvergenceRsquaredReached, MaxConvergenceBeforeMinLimitReached);
    //                                _convergenceCancellationToken.Cancel();
    //                            }
    //                        }
    //                    }
    //                    else
    //                    {
    //                        Debug.WriteLine("{0} model converged at {1} level.", _modelInput.ModelName, ConvergenceRsquaredReached);
    //                        //_logger.Info("{0} model converged at {1} level.", _dataSource, ConvergenceRsquaredReached);
    //                        _convergenceCancellationToken.Cancel();
    //                    }
    //                }
    //                else
    //                {
    //                    if (MaxConvergenceBeforeMinLimitReached.HasValue)
    //                    {
    //                        Debug.WriteLine("{0} model diverged so we need to re-converge at the max {1} level.",
    //                            _modelInput.ModelName, MaxConvergenceBeforeMinLimitReached);
    //                        //_logger.Info("{0} model diverged so we need to re-converge at the max {1} level.",
    //                        //    _dataSource, MaxConvergenceBeforeMinLimitReached);
    //                        _convergenceCancellationToken.Cancel();
    //                    }
    //                }
    //            }
    //            else if (MaxIterationIndex.HasValue)
    //            {
    //                if (IterationIndex == MaxIterationIndex)
    //                {
    //                    ConvergenceRsquaredReached = iterationResult.ValidationMetrics.MicroAccuracy;
    //                    MaxConvergenceBeforeMinLimitReached = ConvergenceRsquaredReached;
    //                    Debug.WriteLine("{0} model re-converged at the max {1} level at Iternation {2}.",
    //                        _modelInput.ModelName, MaxConvergenceBeforeMinLimitReached, MaxIterationIndex);
    //                    //_logger.Info("{0} model re-converged at the max {1} level at Iternation {2}.",
    //                    //    _dataSource, MaxConvergenceBeforeMinLimitReached, MaxIterationIndex);
    //                    _convergenceCancellationToken.Cancel();
    //                }
    //            }

    //        }

    //        //iterationResult.ValidationMetrics.
    //    }
    //}

    public class MulticlassExperimentProgressHandler : IProgress<RunDetail<MulticlassClassificationMetrics>>
    {
        private int _iterationIndex;

        public void Report(RunDetail<MulticlassClassificationMetrics> iterationResult)
        {
            if (_iterationIndex++ == 0)
            {
                ConsoleHelper.PrintMulticlassClassificationMetricsHeader();
            }

            if (iterationResult.Exception != null)
            {
                ConsoleHelper.PrintIterationException(iterationResult.Exception);
            }
            else
            {
                ConsoleHelper.PrintIterationMetrics(_iterationIndex, iterationResult.TrainerName,
                    iterationResult.ValidationMetrics, iterationResult.RuntimeInSeconds);
            }
        }
    }
}

}

acrigney avatar Sep 23 '22 04:09 acrigney

And can you explain why you have the new AutoML pipeline / experiment? Best Regards, Alistair

acrigney avatar Sep 23 '22 04:09 acrigney

@acrigney See comments below.

I had to copy the lib_lightgbm.dll and lightgbm.exe into the x64/debug folder as they were not included but I have them from other projects.

I don't think that's something you should have to do. Can you try to repro on a new project and let us know if that's still the case on the new project?

why do we have to do the // Transform the dataset. var transformedData = model.Transform(data);

I don't think I've done that in the notebook sample. Here's the line I see in the notebook:

var transformedData =  dataPrepTransformer.Transform(idv);

This takes an input IDataView and outputs a new IDataView with the applied transforms you defined in the pipeline. In this case it's the data preprocessing pipeline.

In the line you've shared, it looks like it's doing the same, except you've called it model instead of dataPrepTransformer.

The reason I did this is the original pipeline you shared is split into two (data preprocessing and training), so I followed your convention. You're more than welcome to have a single pipeline that combines the preprocessing and AutoML components.

i.e.

var pipeline = 
    mlContext.Transforms.Concatenate("Features", featureColumns)
        .Append(mlContext.Transforms.Conversion.MapValueToKey("Label"))
        .Append(mlContext.Transforms.NormalizeMinMax("Features"))
        .Append(mlContext.Auto().MultiClassification());

var experiment = 
    mlContext.Auto().CreateExperiment()
        .SetPipeline(pipeline)
        .SetEvaluateMetric(MulticlassClassificationMetric.MicroAccuracy,labelColumn:"Label")
        .SetTrainingTimeInSeconds(60)
        .SetDataset(transformedData,fold:5);

In this case, the preprocessing and model training are all just one pipeline.

Also I am not a fan of the notebook style

Thanks for the feedback. How would you use notebooks? What makes them so clunky in your opinion?

explain why you have the new AutoML pipeline / experiment?

This is our new implementation. Starting with ML.NET 2.0 that's what the AutoML API will look like and going forward it's the one we recommend you use.

In addition to being able to explore more models in the same amount of time, it also provides some more advanced features if you want more control over how AutoML finds the best model without sacrificing much of the original simplicity when you accept the defaults.

luisquintanilla avatar Sep 23 '22 17:09 luisquintanilla

Could you explain how to interpret the PFI calculated values?

Do most important to least important is represented by positive mean values to negative or reverse?

aforoughi1 avatar Sep 26 '22 13:09 aforoughi1

Totally awesome thanks so much for your very detailed response! Brilliant! The reason I am not a fan of notebooks is that a really for exploration we as software engineers like to refactor and build reusable code. I have abstracted most of the ML.NET libraries into a single framework so I don't have to copy and paste as all data scientists do! I must plug in the new interface in my framework!!

acrigney avatar Sep 27 '22 01:09 acrigney

@luisquintanilla can you explain how to better understand what PFI outputs?

Closing this issue for now though as it has been resovled.

michaelgsharp avatar Oct 10 '22 19:10 michaelgsharp