grf
grf copied to clipboard
Estimating memory usage
trafficstars
Is there a way/rules of thumb to estimate memory usage prior to running grf? I have a dataset of 200k observations and twenty covariates for a job running on a university cluster. 60GB of memory wasn't enough for 10,000 trees and the job crashed. Requesting more memory isn't costless so I'd like benchmark in advance how much is actually needed or some kind of upper limit to avoid further failed jobs. Thanks a lot!
Memory scales linearly in number of trees, so you can fit a forest with num.trees = 1 (and ci.group.size = 1) to extrapolate. Bumping up min.node.size (shallower trees), or reducing sample.fraction is also helpful on huge data sets.