online-random-forests
online-random-forests copied to clipboard
Understanding parameter
Hi,
I was wondering if you could detail the significance of the two parameters in the code. The first is "numRandomTests". Is this parameter analogous to mtry in Brieman's random forest (the number of features randomly selected at each node)?
The second is "numProjectionFeatures". In a traditional random forest, the split at each node occurs on one variable. Does this parameter set to >1 create a function that the data is split on? Is there an analogous parameter in traditional random forests?
Thanks,
Daniel
Hi Daniel,
-
numRandomTests
: Yes, it's the number of features to be randomly chosen at each node -
numProjectionFeatures
: In this implementation, I use hyperplanes to split the data. The traditional RF uses axis aligned hyperplanes (i.e. splitting only on a single feature). You can generalize that to choosingnumProjectionFeatures
features, creating a random hyperplane from them and then using that to split the data. I think if you set it to 1, it will create axis aligned cuts similar to traditional forests.
Amir
Thank you for your reply. I was wondering if you could define epochs. Is that the number of times it cycles through the data?
Is it possible to train the model in one go, using full-batch instead of online, for comparison purposes?
Yes, epochs is the number of times it goes through the entire dataset.
If you're referring to full-batch as offline training where the learner has access to all dataset at once, not I'm afraid you can't train it like that as the algorithm is designed for online training.
However, you can use sklearn or any other libraries that offer RF implementation for offline training.