grf icon indicating copy to clipboard operation
grf copied to clipboard

Estimating memory usage

Open pasony opened this issue 3 years ago • 1 comments
trafficstars

Is there a way/rules of thumb to estimate memory usage prior to running grf? I have a dataset of 200k observations and twenty covariates for a job running on a university cluster. 60GB of memory wasn't enough for 10,000 trees and the job crashed. Requesting more memory isn't costless so I'd like benchmark in advance how much is actually needed or some kind of upper limit to avoid further failed jobs. Thanks a lot!

pasony avatar May 30 '22 23:05 pasony

Memory scales linearly in number of trees, so you can fit a forest with num.trees = 1 (and ci.group.size = 1) to extrapolate. Bumping up min.node.size (shallower trees), or reducing sample.fraction is also helpful on huge data sets.

erikcs avatar May 31 '22 00:05 erikcs