machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Unable to remove SdcaLogisticRegressionOva from AutoML Multiclassification Experiment

Open bettwedder opened this issue 1 year ago • 0 comments

System Information (please complete the following information):

  • OS & Version: Windows 11
  • ML.NET Version: v3.0.1 & AutoML 0.21.1
  • .NET Version: 8.0

Describe the bug When creating an AutoML Multiclassification Experiment, you are unable to remove the trainer "SdcaLogisticRegressionOva".

To Reproduce Steps to reproduce the behavior:

  1. Create a Multiclass experiment settings object
  2. Iterate on settings.Trainers and remove all trainers that are not "LightGbm" or "FastForest"
  3. Create a Multiclass Progress Reporter that will output the TrainerName used.
  4. Use this replace command to remove the currently bugged (3.0.1 and 0.21.1) TrainerName value:
    TrainerName.Replace("Multi", "").Replace("ReplaceMissingValues", "").Replace("Concatenate", "").Replace("Unknown", "").Replace("=>", "");
  5. Run experiment and monitor names.

Expected behavior One of the first three models will include the unremovable trainer.

Screenshots, Code, Sample Projects

  
               MulticlassExperimentSettings settings = new MulticlassExperimentSettings()
                {
                    OptimizingMetric = optimizeMetric,
                    MaxExperimentTimeInSeconds = experimentTime,
                    CacheDirectoryName = cacheDir,
                    CancellationToken = cts.Token,
                    CacheBeforeTrainer = CacheBeforeTrainer.On
                    
                };

                bool keptLightGBM = false;
                foreach (var trainer in settings.Trainers.ToList())
                {

                    if (!trainer.ToString().ToUpperInvariant().Contains("LIGHTGBM") && !trainer.ToString().ToUpperInvariant().Contains("FASTFOREST"))
                    {
                        settings.Trainers.Remove(trainer);
                        Console.WriteLine("Removed Trainer: " + trainer.ToString());
                    }
                    //else
                    //{
                    //    if (keptLightGBM)
                    //    {
                    //        settings.Trainers.Remove(trainer);
                    //        Console.WriteLine("Removed Extra "LightGbm" Trainer: " + trainer.ToString());
                    //    }
                    //    else
                    //        keptLightGBM = true;
                    //}
                }

                MulticlassClassificationExperiment experiment = context.Auto().CreateMulticlassClassificationExperiment(settings);
                ExperimentResult<MulticlassClassificationMetrics> result;

                result = experiment.Execute(trainData, splitTestData, columnInformation, null, new MulticlassProgressReporter() { labelColumnName = label, CacheDir = cacheDir, ExperimentTime = DateTime.Now });

This code produces this output:

image

Additional context If you only leave one LightGbm as the only trainer, then AutoML uses the "SdcaLogisticRegressionOva" every other time.

The trainer "SdcaLogisticRegressionOva" does not appear in the list after creating a settings object which is supposed to populate the list with all values. Also, if you iterate on list of auto populated trainers, two items appear with the name "LightGbm".

Last, when I peek the definition of Microsoft.ML.AutoML.MulticlassClassificationTrainer, I get this list which also doesn't have "SdcaLogisticRegressionOva" in the list.

// Decompiled with JetBrains decompiler
// Type: Microsoft.ML.AutoML.MulticlassClassificationTrainer
// Assembly: Microsoft.ML.AutoML, Version=1.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51
// MVID: 5D7A79B7-CF20-433B-A534-1ED92C335230
// Assembly location: C:\Users\xxxx\.nuget\packages\microsoft.ml.automl\0.21.1\lib\netstandard2.0\Microsoft.ML.AutoML.dll
// XML documentation location: C:\Users\xxxx\.nuget\packages\microsoft.ml.automl\0.21.1\lib\netstandard2.0\Microsoft.ML.AutoML.xml

#nullable disable
namespace Microsoft.ML.AutoML
{
  /// <summary>
  /// Enumeration of ML.NET multiclass classification trainers used by AutoML.
  /// </summary>
  public enum MulticlassClassificationTrainer
  {
    /// <summary>
    /// <see cref="T:Microsoft.ML.Trainers.OneVersusAllTrainer" /> using <see cref="T:Microsoft.ML.Trainers.FastTree.FastForestBinaryTrainer" />.
    /// </summary>
    FastForestOva,
    /// <summary>
    /// <see cref="T:Microsoft.ML.Trainers.OneVersusAllTrainer" /> using <see cref="T:Microsoft.ML.Trainers.FastTree.FastTreeBinaryTrainer" />.
    /// </summary>
    FastTreeOva,
    /// <summary>
    /// See <see cref="T:Microsoft.ML.Trainers.LightGbm.LightGbmMulticlassTrainer" />.
    /// </summary>
    LightGbm,
    /// <summary>
    /// See <see cref="T:Microsoft.ML.Trainers.LbfgsMaximumEntropyMulticlassTrainer" />.
    /// </summary>
    LbfgsMaximumEntropy,
    /// <summary>
    /// <see cref="T:Microsoft.ML.Trainers.OneVersusAllTrainer" /> using <see cref="T:Microsoft.ML.Trainers.LbfgsLogisticRegressionBinaryTrainer" />.
    /// </summary>
    LbfgsLogisticRegressionOva,
    /// <summary>
    /// See <see cref="T:Microsoft.ML.Trainers.SdcaMaximumEntropyMulticlassTrainer" />.
    /// </summary>
    SdcaMaximumEntropy,
  }
}

bettwedder avatar Feb 23 '24 08:02 bettwedder