machinelearning
machinelearning copied to clipboard
OneDAL FastForest training has an "Array dimensions exceeded supported range" exception
System Information (please complete the following information):
- OS & Version: Linux Alpine
- ML.NET Version: ML.NET 3.0
- .NET Version: .Net 5.0
Describe the bug
Array dimensions exceeded supported range. at System.Collections.Generic.List`1.set_Capacity(Int32 value)
at System.Collections.Generic.List`1.AddWithResize(T item)
at Microsoft.ML.OneDal.OneDalUtils.GetTrainData(IChannel channel, Factory cursorFactory, List`1& featuresList, List`1& labelsList, Int32 numberOfFeatures)
at Microsoft.ML.Trainers.FastTree.FastForestBinaryTrainer.TrainCoreOneDal(IChannel ch, Factory cursorFactory, Int32 featureCount)
at Microsoft.ML.Trainers.FastTree.FastForestBinaryTrainer.TrainModelCore(TrainContext context)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
at Microsoft.ML.AutoML.BinaryClassificationRunner.Run(TrialSettings settings)
at Microsoft.ML.AutoML.BinaryClassificationRunner.RunAsync(TrialSettings settings, CancellationToken ct)
at Microsoft.ML.AutoML.AutoMLExperiment.RunAsync(CancellationToken ct)
We have found this error in internal ML.net logs a lot of times. Looks liks it's not related to train set size (in the case I have copied this error we have only 20 000 training rows)
Still happens.. regression.
System.IndexOutOfRangeException: Index was outside the bounds of the array.
at Microsoft.ML.Trainers.FastTree.Dataset.MapFeatureToFlockAndSubFeature(Int32 feature, Int32& flock, Int32& subfeature)
at Microsoft.ML.Trainers.FastTree.InternalRegressionTree.PopulateThresholds(Dataset dataset)
at Microsoft.ML.Trainers.FastTree.FastForestRegressionTrainer.TrainCoreOneDal(IChannel ch, Factory cursorFactory, Int32 featureCount)
at Microsoft.ML.Trainers.FastTree.FastForestRegressionTrainer.TrainModelCore(TrainContext context)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.Fit(IDataView input)
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
Microsoft.ML.OneDal,0.22.0-preview.24271.1
Is there any benchmark showing that onedal with ml.net is actually faster(when it works)?