machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

LightGBM Multiclassification trainer returning error code -1 "Number of classes should be specified and greater than 1 for multiclass training" but I can't see where to specify the number of classes

Open raymond130 opened this issue 2 years ago • 3 comments

System Information (please complete the following information):

  • OS & Version: Windows 10
  • ML.NET Version: 2.0.0
  • .NET Version: .NET 6.0

Describe the bug this bug occurs when I try to use the transformationpipeline.fit(data) function with the LightGbm trainer, after filling out the appropriate options - I get the error "LightGBM Error, code is -1, error message is 'Number of classes should be specified and greater than 1 for multiclass training'.'"

I tried to locate the source of this error in the source code and I can't figure out how to define the number of classes in the trainer. My label column is one-hot encoded so it should have two classes if I've interpreted the documentation correctly, I'm not sure where the error is coming from.

To Reproduce run the LightGbmMulticlass trainer and try to train with it

Expected behavior Should train properly

Screenshots, Code, Sample Projects Below are two pictures of my code where I define the expected labels and features, and where I pass in the data I use. image image

Here is the code written out:

for the preparedata method:

`IEstimator<ITransformer> dataPipeline = mlContext.Transforms.Conversion.MapValueToKey (outputColumnName: "Label", inputColumnName: nameof(PrMaintenanceClass.failure)) //encode model column .Append(mlContext.Transforms.Categorical.OneHotEncoding ("model", outputKind: OneHotEncodingEstimator.OutputKind.Indicator))

        //define features column
        .Append(mlContext.Transforms.Concatenate("Features",
        // 
        nameof(PrMaintenanceClass.voltmean_3hrs), nameof(PrMaintenanceClass.rotatemean_3hrs),
        nameof(PrMaintenanceClass.pressuremean_3hrs), nameof(PrMaintenanceClass.vibrationmean_3hrs),
        nameof(PrMaintenanceClass.voltstd_3hrs), nameof(PrMaintenanceClass.rotatestd_3hrs),
        nameof(PrMaintenanceClass.pressurestd_3hrs), nameof(PrMaintenanceClass.vibrationstd_3hrs),
        nameof(PrMaintenanceClass.voltmean_24hrs), nameof(PrMaintenanceClass.rotatemean_24hrs),
        nameof(PrMaintenanceClass.pressuremean_24hrs),
        nameof(PrMaintenanceClass.vibrationmean_24hrs),
        nameof(PrMaintenanceClass.voltstd_24hrs), nameof(PrMaintenanceClass.rotatestd_24hrs),
        nameof(PrMaintenanceClass.pressurestd_24hrs), nameof(PrMaintenanceClass.vibrationstd_24hrs),
        nameof(PrMaintenanceClass.error1count), nameof(PrMaintenanceClass.error2count),
        nameof(PrMaintenanceClass.error3count), nameof(PrMaintenanceClass.error4count),
        nameof(PrMaintenanceClass.error5count), nameof(PrMaintenanceClass.sincelastcomp1),
        nameof(PrMaintenanceClass.sincelastcomp2), nameof(PrMaintenanceClass.sincelastcomp3),
        nameof(PrMaintenanceClass.sincelastcomp4),
        nameof(PrMaintenanceClass.model), nameof(PrMaintenanceClass.age)));

        return dataPipeline;`

and for the train method:

` var transformationPipeline = PrepareData(mlContext);

        //settings hyper parameters
        TrainerOptions = new LightGbmMulticlassTrainer.Options();
        TrainerOptions.FeatureColumnName = "Features";
        TrainerOptions.LabelColumnName = "Label";
        TrainerOptions.LearningRate = 0.005;
        TrainerOptions.NumberOfLeaves = 70;
        TrainerOptions.NumberOfIterations = 2000;
        TrainerOptions.NumberOfLeaves = 50;
        TrainerOptions.UnbalancedSets = true;
        TrainerOptions.Sigmoid = 0.2;
        //
        var boost = new DartBooster.Options();
        boost.XgboostDartMode = true;
        boost.MaximumTreeDepth = 25;
        TrainerOptions.Booster = boost;

        // Define LightGbm algorithm estimator
        IEstimator<ITransformer> lightGbm = mlContext.MulticlassClassification.Trainers.LightGbm(TrainerOptions);

        //train the ML model
        TransformerChain<ITransformer> model = transformationPipeline.Append(lightGbm).Fit(preparedData);

        //return trained model for evaluation
        return model;`

Additional context I hope this can help! I feel like I made a simple error

raymond130 avatar Sep 27 '23 20:09 raymond130

please check the distinct values of PrMaintenanceClass.failure in preparedData

feiyun0112 avatar Oct 03 '23 05:10 feiyun0112

@raymond130 I got this to work (using the IrisData set, essentially just a modification of the MulticlassClassification_Iris sample using your code)

        private static void BuildTrainEvaluateAndSaveModelOneHot(MLContext mlContext)
        {
            var trainingDataView = mlContext.Data.LoadFromTextFile<IrisData>(TrainDataPath, hasHeader: true);
            var testDataView = mlContext.Data.LoadFromTextFile<IrisData>(TestDataPath, hasHeader: true);


            var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "KeyColumn", inputColumnName: nameof(IrisData.Label))
                .Append(mlContext.Transforms.Categorical.OneHotEncoding("KeyColumn", outputKind: OneHotEncodingEstimator.OutputKind.Key))
                .Append(mlContext.Transforms.Concatenate("Features", nameof(IrisData.SepalLength), nameof(IrisData.SepalWidth), nameof(IrisData.PetalLength), nameof(IrisData.PetalWidth))
                .AppendCacheCheckpoint(mlContext));

            //settings hyper parameters
            var TrainerOptions = new LightGbmMulticlassTrainer.Options();
            TrainerOptions.FeatureColumnName = "Features";
            TrainerOptions.LabelColumnName = "KeyColumn";
            TrainerOptions.LearningRate = 0.005;
            TrainerOptions.NumberOfLeaves = 70;
            TrainerOptions.NumberOfIterations = 2000;
            TrainerOptions.NumberOfLeaves = 50;
            TrainerOptions.UnbalancedSets = true;
            TrainerOptions.Sigmoid = 0.2;

            var boost = new DartBooster.Options();
            boost.XgboostDartMode = true;
            boost.MaximumTreeDepth = 25;
            TrainerOptions.Booster = boost;

            // Define LightGbm algorithm estimator
            IEstimator<ITransformer> lightGbm = mlContext.MulticlassClassification.Trainers.LightGbm(TrainerOptions);
            var transformationPipeline = dataProcessPipeline.Append(lightGbm);

            //train the ML model
            TransformerChain<ITransformer> trainedModel = transformationPipeline.Fit(trainingDataView);

            // evaluate the model and show accuracy stats
            Console.WriteLine("===== Evaluating Model's accuracy with Test data =====");
            var predictions = trainedModel.Transform(testDataView);
            var metrics = mlContext.MulticlassClassification.Evaluate(predictions, "Label", "Score");

            Common.ConsoleHelper.PrintMultiClassClassificationMetrics(lightGbm.ToString(), metrics);

            // Save/persist the trained model to a .ZIP file
            mlContext.Model.Save(trainedModel, trainingDataView.Schema, ModelPathLightGbm);
            Console.WriteLine("The model is saved to {0}", ModelPathLightGbm);
        }

Note a couple of things that I had to change from your code: The outputColumnName of the first Transform had to be the input column name to OneHotEncoding, and it also needs to match TrainerOptions.LabelColumnName

tearlant avatar Nov 09 '23 19:11 tearlant

Hi there - I was able to get this fixed! @feiyun0112 was correct - my dataset had an improperly marked failure class that contained all the same values. @tearlant thank you for the additional help with the proofreading!

Thank you all so much!

raymond130 avatar Jan 22 '24 21:01 raymond130