Adam Li
Adam Li
Think this is now fully enabled by #114. @PSSF23 lmk how this works. Will be exciting to see this in action.
As of now this is mostly addressed w/ the PRs in our upstream scikit-learn fork. However, as discussed offline, there are some issues with regards to seg faults that occur...
Downstream it will be useful to verify that: 1) all unit tests pass 2) compare w/ and w/o binning in terms of accuracy/roc_score/precision vs fit-time and predict-times. I think this...
Relevant paper: https://arxiv.org/pdf/1609.06119.pdf
See related discussion in scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/5212
This has been completed naively in upstream `scikit-learn-tree` (i.e. the fork of scikit-learn repo)
We can bin naively already for all forests. That is, we bin at the Python API level. As to whether or not this improves matters is another experimental issue. We...
> I don't understand. Are you saying one could naively implement bin per node in Python, and that code would be easy to write? Can you show us the code...
> @adam2392 So you hypothesize, without strong empirical results, that this is _too_ slow, for some definition of _too_? Yes because the Cython code remains the same, so it would...
This is not so trivial to do w/o adding runtime deficiency. The current scikit-learn splitter that operates on one feature at a time has the nice property that for: 1....