online-random-forests icon indicating copy to clipboard operation
online-random-forests copied to clipboard

Understanding parameter

Open dberma15 opened this issue 8 years ago • 3 comments

Hi,

I was wondering if you could detail the significance of the two parameters in the code. The first is "numRandomTests". Is this parameter analogous to mtry in Brieman's random forest (the number of features randomly selected at each node)?

The second is "numProjectionFeatures". In a traditional random forest, the split at each node occurs on one variable. Does this parameter set to >1 create a function that the data is split on? Is there an analogous parameter in traditional random forests?

Thanks,

Daniel

dberma15 avatar Jun 22 '16 15:06 dberma15

Hi Daniel,

  • numRandomTests: Yes, it's the number of features to be randomly chosen at each node
  • numProjectionFeatures: In this implementation, I use hyperplanes to split the data. The traditional RF uses axis aligned hyperplanes (i.e. splitting only on a single feature). You can generalize that to choosing numProjectionFeatures features, creating a random hyperplane from them and then using that to split the data. I think if you set it to 1, it will create axis aligned cuts similar to traditional forests.

Amir

amirsaffari avatar Jun 22 '16 16:06 amirsaffari

Thank you for your reply. I was wondering if you could define epochs. Is that the number of times it cycles through the data?

Is it possible to train the model in one go, using full-batch instead of online, for comparison purposes?

dberma15 avatar Jun 28 '16 17:06 dberma15

Yes, epochs is the number of times it goes through the entire dataset.

If you're referring to full-batch as offline training where the learner has access to all dataset at once, not I'm afraid you can't train it like that as the algorithm is designed for online training.

However, you can use sklearn or any other libraries that offer RF implementation for offline training.

amirsaffari avatar Jun 28 '16 18:06 amirsaffari