ProgLearn
ProgLearn copied to clipboard
support streaming (and possibly parallel) decision tree option
i think probably the best implementation is from this: https://github.com/huawei-noah/streamDM but it is spark. it is based on this paper: https://dl.acm.org/doi/10.1145/347090.347107
also another implementation: https://github.com/soundcloud/spdt
some additional relevant papers:
- https://ieeexplore.ieee.org/document/1571498
- https://link.springer.com/chapter/10.1007/978-3-642-15880-3_15
- https://dl.acm.org/doi/10.1145/3054925
- https://link.springer.com/article/10.1007/s10994-017-5642-8
- https://ieeexplore.ieee.org/document/4318108
- https://www.jmlr.org/papers/v11/ben-haim10a.html
finally, note that sklearn does not currently support this functionality: https://scikit-learn.org/stable/modules/computing.html#incremental-learning
Interested in this issue for Sprint 1. Proposed DoD: Allow transformers to use streaming data for training.
Could I please be assigned to this issue? Thanks in advance.
Could I please be assigned to this issue as well?