machinelearning-samples icon indicating copy to clipboard operation
machinelearning-samples copied to clipboard

I sometimes get this with the regression example: System.InvalidOperationException: Cannot hold covariance matrix in memory

Open tkdogan opened this issue 4 years ago • 3 comments

Any idea what could be causing this. My training data is not very big. 40K rows.

Running AutoML regression experiment for 60 seconds... | Trainer RSquared Absolute-loss Squared-loss RMS-loss Duration | |1 LightGbmRegression 0.9930 140404.81 42178224023.09 205373.38 9.2 | |2 FastTreeRegression 0.9934 136725.85 39940180886.08 199850.40 8.8 | |3 FastTreeTweedieRegression 0.9913 142834.10 52816241323.19 229817.84 9.2 | |4 FastForestRegression 0.9091 538589.21 551116053939.22 742371.91 9.4 | Exception during AutoML iteration: System.InvalidOperationException: Cannot hold covariance matrix in memory with 94459 features at Microsoft.ML.Trainers.OlsTrainer.TrainCore(IChannel ch, Factory cursorFactory, Int32 featureCount) at Microsoft.ML.Trainers.OlsTrainer.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) at Microsoft.ML.Trainers.TrainerEstimatorBase2.Fit(IDataView input) at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String groupId, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger) |6 LightGbmRegression 0.9295 447071.45 427662770504.29 653959.30 9.4 | |7 FastTreeRegression 0.2663 1336968.13 4450480773603.34 2109616.26 7.0 | |8 FastTreeTweedieRegression 0.9923 133275.19 46493327388.81 215623.11 12.2 |

Top models ranked by R-Squared -- | Trainer RSquared Absolute-loss Squared-loss RMS-loss Duration | |1 FastTreeRegression 0.9934 136725.85 39940180886.08 199850.40 8.8 | |2 LightGbmRegression 0.9930 140404.81 42178224023.09 205373.38 9.2 | |3 FastTreeTweedieRegression 0.9923 133275.19 46493327388.81 215623.11 12.2 |

tkdogan avatar Sep 15 '20 12:09 tkdogan

Hi @tkdogan

Sorry you ran into this. Which example do you get the error on?

luisquintanilla avatar Nov 05 '20 14:11 luisquintanilla

The ordinary least squares regression (OSLR) trainer is memory bound and allocates O(N^2) memory for the features: https://github.com/dotnet/machinelearning/blob/712c3ec0745f45b93e394f8e333deaa5da4f2737/src/Microsoft.ML.Mkl.Components/OlsLinearRegression.cs#L180-L182

I wouldn't be concerned, the AutoML code ignores the failure of that model and continues optimizing the remaining trainers.

If you're using the AutoML API, you can add a feature selection step as a pre-featurizer -- https://github.com/dotnet/machinelearning-samples/blob/5831bdd9bea8e42e1d3e4967486653a5df1abe4c/samples/csharp/getting-started/AdvancedExperiment_AutoML/README.md#step-3-add-a-pre-featurizer

justinormont avatar Nov 05 '20 15:11 justinormont

I ended up reducing dimensionality and the problem went away. Thanks for explaining this and good to know that AutoML will ignore those failures.

tkdogan avatar Nov 05 '20 23:11 tkdogan