Ryan Bressler
Ryan Bressler
Generate random derived features: http://media.nips.cc/nipsbooks/nipspapers/paper_files/nips26/289.pdf
BestSplitBigCatIter is too slow. Possibly investigate using fully randomized search earlier or "Stochastic Greedy Algorithms: A leaning based approach to combinatorial optimization" [1] as in rf-ace. [1] http://www.thinkmind.org/index.php?view=article&articleid=soft_v4_n12_2011_1
http://nerds.airbnb.com/confidence-splitting-criterions/
http://link.springer.com/article/10.1007/s11222-012-9349-1#page-1 http://www.statistik.lmu.de/PR2/lehre/sk2011/Hapfelmeier.pdf
Some references: http://mail-archives.apache.org/mod_mbox/mahout-dev/201302.mbox/%3CCAJQdJb23YDLyNJ-QHSmdDizorDA_d0O8rNy+s8OPbBHCHBp4OA@mail.gmail.com%3E Planet http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36296.pdf Rainforest http://www.cs.cornell.edu/johannes/papers/1998/vldb1998-rainforest.pdf
"Pathway analysis using random forests with bivariate node-split for survival outcomes" suggests a simple method for splitting on two features at once by choosing sqrt(m) features and looking for splits...
P-Values for variable importance are desirable as they are easier to interpret and will be potentially easier to drop in to our [other tools](https://github.com/cancerregulome/). A couple of different methods seem...
Stop tree/forest growth early based on decrease in oob error. Could both shorten running time and control overfitting especially in simple boosted models. Some ideas: http://cavemoosum.blogspot.com.au/2014/02/cross-validation-is-over-long-live.html http://cran.r-project.org/web/packages/gbm/index.html