SPORF
SPORF copied to clipboard
This is the implementation of Sparse Projection Oblique Randomer Forest
see https://people.eecs.berkeley.edu/~brecht/papers/08.rah.rec.nips.pdf
verify that performance is comparable
@jovo would like tests for the following: - [ ] Time it takes to convert from row-major to col-major and vice-versa (done in C++, not python) - [ ] Times...
compare methods for handling missing data - imputing -omitting - ?
let n be the number of samples, and m be the number of unique values per a given feature. then for that feature, we only need to search over min(n,m)...
These two statistics for node impurity seem very similar (up to a shift?): I'm not sure if there is a benefit to using one over the other, @jovo? https://github.com/neurodata/RerF/blob/c4d602cd4d763dc728bb48e2cf84114638d9f074/packedForest/src/forestTypes/binnedTree/inNodeClassTotals.h#L60-L77 [giniTest.Rmd](https://gist.github.com/MrAE/ec4e10bcdae2596809576ff3184804f3)...
probably want to install both R and Python versions? or maybe separate dockerfile's for each? can use this as a base: https://github.com/tpsatish95/deep-conv-rf/blob/master/Dockerfile