RandomerForest icon indicating copy to clipboard operation
RandomerForest copied to clipboard

Compute probability of split direction

Open tyler-tomita opened this issue 7 years ago • 0 comments

Suppose the theoretically optimal split direction at an arbitrary node is defined by a vector v*. Assume p is large, and v* has a sparsity of s. For RerF and RR-RF, let the selected split direction be \hat{v}{rerf} and \hat{v}{rr-rf}, respectively. Compute the probability that the angle between \hat{v}{rerf} and v* and \hat{v}{rr-rf} and v* is less than \delta.

My intuition says that when sparsity of v* is very close to one, RerF has a reasonable probability of finding a split direction close to the optimal, while RR-RF does not. When sparsity of v* is close to zero, RerF has no chance of getting close because of the sparsity constraint on the split directions, and RR-RF has a finite but very small chance of getting close because it's high dimensional. Even if RR-RF does get close to the right split direction, estimation of the location of the split will be poor if p >> n. Therefore, when p >> n, RerF will do at least as well, and sometimes substantially better than RR-RF because of the "bet on sparsity" principle.

tyler-tomita avatar Apr 04 '17 22:04 tyler-tomita