ProgLearn icon indicating copy to clipboard operation
ProgLearn copied to clipboard

support streaming (and possibly parallel) decision tree option

Open jovo opened this issue 4 years ago • 4 comments

i think probably the best implementation is from this: https://github.com/huawei-noah/streamDM but it is spark. it is based on this paper: https://dl.acm.org/doi/10.1145/347090.347107

also another implementation: https://github.com/soundcloud/spdt

jovo avatar Jul 31 '20 16:07 jovo

some additional relevant papers:

  • https://ieeexplore.ieee.org/document/1571498
  • https://link.springer.com/chapter/10.1007/978-3-642-15880-3_15
  • https://dl.acm.org/doi/10.1145/3054925
  • https://link.springer.com/article/10.1007/s10994-017-5642-8
  • https://ieeexplore.ieee.org/document/4318108
  • https://www.jmlr.org/papers/v11/ben-haim10a.html

finally, note that sklearn does not currently support this functionality: https://scikit-learn.org/stable/modules/computing.html#incremental-learning

jovo avatar Aug 11 '20 19:08 jovo

Interested in this issue for Sprint 1. Proposed DoD: Allow transformers to use streaming data for training.

PSSF23 avatar Sep 23 '20 03:09 PSSF23

Could I please be assigned to this issue? Thanks in advance.

KevinWang905 avatar Sep 16 '21 18:09 KevinWang905

Could I please be assigned to this issue as well?

nhahn7 avatar Oct 07 '21 18:10 nhahn7