machinelearning
machinelearning copied to clipboard
ML.Net: System.OutOfMemoryException: 'Exception of type 'System.OutOfMemoryException' was thrown.' on small dataset
System Information (please complete the following information):
- OS & Version: Windows 10
- ML.NET Version: Microsoft.ML 3.0.1
- .NET Version: 6.0
Describe the bug Attempt to train model and run into out-of-memory exception, PC doesn't even use 20% of memory. Build for any CPU.
To Reproduce Steps to reproduce the behavior:
- Load data text-based set with 700 000 rows (60mb) and 2 columns (feature and label)
- Run Transforms.Conversion.MapValueToKey for the Label
- Run Transforms.Text.FeaturizeText on the Features
- Append a mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy("Label", "Features") prediction
- Attempt to Fit the model
- Receive an out of memory exception on trainingPipeline.Fit(trainData)
System.OutOfMemoryException
HResult=0x8007000E
Message=Exception of type 'System.OutOfMemoryException' was thrown.
Source=Microsoft.ML.Core
StackTrace:
at Microsoft.ML.Internal.Utilities.VBufferUtils.CreateDense[T](Int32 length)
at Microsoft.ML.Trainers.SdcaTrainerBase`3.TrainCore(IChannel ch, RoleMappedData data, LinearModelParameters predictor, Int32 weightSetCount)
at Microsoft.ML.Trainers.StochasticTrainerBase`2.TrainModelCore(TrainContext context)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
at Program.<Main>$(String[] args) in C:\Ml.Product.2\Ml.Product.2\Program.cs:line 29
Expected behavior I have a 60mb CSV with 700000 rows, IMO this is not a huge amount. My machine has 32 GB of memory and doesn't even use 20% of my memory when I watch performance. I tried to build a release build on 64bit and still ran into the out-of-memory exception. Please could someone advise me on what I am doing wrong, this seems like a bug? Eventually, I want to train much larger data sets, surely ML.Net should be able to do that?
Screenshots, Code, Sample Projects
MLContext _mlContext;
PredictionEngine<MlProduct, MlProductPrediction> _predictionEngine;
ITransformer _trainedModel;
IDataView _trainingDataView;
_mlContext = new MLContext();
_trainingDataView = LoadDataFromCSV();
TrainTestData dataSplit = _mlContext.Data.TrainTestSplit(_trainingDataView, testFraction: 0.2);
IDataView trainData = dataSplit.TrainSet;
IDataView testData = dataSplit.TestSet;
var pipeline = _mlContext.Transforms.Conversion.MapValueToKey(inputColumnName: "CategoryName", outputColumnName: "Label")
.Append(_mlContext.Transforms.Text.FeaturizeText("Features", "ProductName"));
var trainingPipeline = pipeline.Append(_mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy("Label", "Features"))
.Append(_mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));
_trainedModel = trainingPipeline.Fit(trainData);
IDataView transformTest = _trainedModel.Transform(testData);
public class MlProduct
{
[LoadColumn(0)]
[ColumnName("ProductName")]
public string ProductName { get; set; }
[LoadColumn(1)]
[ColumnName("CategoryName")]
public string CategoryName { get; set; }
}
public class MlProductPrediction
{
[ColumnName("PredictedLabel")]
public string CategoryName;
[ColumnName("PredictionScore")]
public float Score { get; set; }
}