grf Estimating memory usage

Estimating memory usage

Open pasony opened this issue 3 years ago • 1 comments

trafficstars

Is there a way/rules of thumb to estimate memory usage prior to running grf? I have a dataset of 200k observations and twenty covariates for a job running on a university cluster. 60GB of memory wasn't enough for 10,000 trees and the job crashed. Requesting more memory isn't costless so I'd like benchmark in advance how much is actually needed or some kind of upper limit to avoid further failed jobs. Thanks a lot!

May 30 '22 23:05 pasony

Memory scales linearly in number of trees, so you can fit a forest with num.trees = 1 (and ci.group.size = 1) to extrapolate. Bumping up min.node.size (shallower trees), or reducing sample.fraction is also helpful on huge data sets.

May 31 '22 00:05 erikcs

grf grf copied to clipboard

Estimating memory usage

grf
grf copied to clipboard