Ryan Bressler issues

Results 26 issues of


                                            Ryan Bressler

Random Features

Generate random derived features: http://media.nips.cc/nipsbooks/nipspapers/paper_files/nips26/289.pdf

Slow performance splitting on large categorical variables.

BestSplitBigCatIter is too slow. Possibly investigate using fully randomized search earlier or "Stochastic Greedy Algorithms: A leaning based approach to combinatorial optimization" [1] as in rf-ace. [1] http://www.thinkmind.org/index.php?view=article&articleid=soft_v4_n12_2011_1

enhancement

Confidence Splitting Criteria

http://nerds.airbnb.com/confidence-splitting-criterions/

Feature selection permutations with randomly assigned cases

http://link.springer.com/article/10.1007/s11222-012-9349-1#page-1 http://www.statistik.lmu.de/PR2/lehre/sk2011/Hapfelmeier.pdf

enhancement

Mondrian Forests for Online Learning

http://arxiv.org/abs/1406.2602

enhancement

Massively Parrallel/Out of Core Learning

Some references: http://mail-archives.apache.org/mod_mbox/mahout-dev/201302.mbox/%3CCAJQdJb23YDLyNJ-QHSmdDizorDA_d0O8rNy+s8OPbBHCHBp4OA@mail.gmail.com%3E Planet http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36296.pdf Rainforest http://www.cs.cornell.edu/johannes/papers/1998/vldb1998-rainforest.pdf

enhancement

Bivariate Splitting / Pathway Awareness

"Pathway analysis using random forests with bivariate node-split for survival outcomes" suggests a simple method for splitting on two features at once by choosing sqrt(m) features and looking for splits...

enhancement

Importance Overhall: What Method(s) to Get P Values?

P-Values for variable importance are desirable as they are easier to interpret and will be potentially easier to drop in to our [other tools](https://github.com/cancerregulome/). A couple of different methods seem...

enhancement

question

What file formats should be supported for data and models?

enhancement

question

Early Stopping from OOB

Stop tree/forest growth early based on decrease in oob error. Could both shorten running time and control overfitting especially in simple boosted models. Some ideas: http://cavemoosum.blogspot.com.au/2014/02/cross-validation-is-over-long-live.html http://cran.r-project.org/web/packages/gbm/index.html