tribuo icon indicating copy to clipboard operation
tribuo copied to clipboard

Memory and SQLDataSource

Open Kastanek opened this issue 2 years ago • 1 comments

Ask the question Is training a model using SQLDataSource suitable for large datasets that do not fit in RAM? I expect my dataset to grow to hundreds of thousands of records. I see that batching is performed, but I'm not sure whether a model can be trained this way. I'm particularly interested in training with XGBoostRegressionTrainer. Is your question about a specific Tribuo class? SQLDataSource

Kastanek avatar May 15 '23 09:05 Kastanek

We have trained XGBoost models in Tribuo with hundreds of thousands of records, though we used a fairly large machine to do so. Batch loading from the SQL DB isn't the relevant part, as Tribuo requires all the data be in memory before it can train a model.

Craigacp avatar May 15 '23 13:05 Craigacp