Adam Li comments

Results 473 comments of


                                            Adam Li

trafficstars

Adding streaming trees for all oblique split classifiers

Think this is now fully enabled by #114. @PSSF23 lmk how this works. Will be exciting to see this in action.

Adding streaming trees for all oblique split classifiers

As of now this is mostly addressed w/ the PRs in our upstream scikit-learn fork. However, as discussed offline, there are some issues with regards to seg faults that occur...

Implement binning of `X` as an optional preprocessing step for trees

Downstream it will be useful to verify that: 1) all unit tests pass 2) compare w/ and w/o binning in terms of accuracy/roc_score/precision vs fit-time and predict-times. I think this...

Implement binning of `X` as an optional preprocessing step for trees

Relevant paper: https://arxiv.org/pdf/1609.06119.pdf

Implement binning of `X` as an optional preprocessing step for trees

See related discussion in scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/5212

Implement binning of `X` as an optional preprocessing step for trees

This has been completed naively in upstream `scikit-learn-tree` (i.e. the fork of scikit-learn repo)

Implement binning of `X` as an optional preprocessing step for trees

We can bin naively already for all forests. That is, we bin at the Python API level. As to whether or not this improves matters is another experimental issue. We...

Implement binning of `X` as an optional preprocessing step for trees

> I don't understand. Are you saying one could naively implement bin per node in Python, and that code would be easy to write? Can you show us the code...

Implement binning of `X` as an optional preprocessing step for trees

> @adam2392 So you hypothesize, without strong empirical results, that this is _too_ slow, for some definition of _too_? Yes because the Cython code remains the same, so it would...

We should keep track of "constant features" similar to scikit-learn RF

This is not so trivial to do w/o adding runtime deficiency. The current scikit-learn splitter that operates on one feature at a time has the nice property that for: 1....