h2o-3
h2o-3 copied to clipboard
Single Decision Tree
Implement Standard Decision Tree.
Specific task for student Yuliia Syzon.
Adam Valenta commented: Currently, we are focusing on binomial classification with UniformAdaptive histogram.
In progress:
- Fix bug with different prediction on different machines
- Benchmark training time and scalability agains DRF
- Fix the binning to go through data only ones and not for each bin and each feature
Major stuff on roadmap:
- Support categorical splits
- Support NaNs splits
- Support other types of histograms: ** UniformAdaptive (/) ** Random ** QuantilesGlobal ** RoundRobin
- Support Tree printing
- Support multinomial classification
- Support regression
- Support scoring in training process
- Support grid search
- Support weights
- Support early stopping
- Make findBestSplit parallel
- Support cross validation
- Support MOJO
- Support Machine Learning Interpretability stuff: Shapley, PDP plot,…
Minor stuff to be done:
- Restructuralize code to StandardDecisonTree class and try to guess interface for shallow tree
- add parameters nbins, mtries, custom_metric, min_rows, (nbins_cats), min_split_improvement, categorical_encoding, check_constant_response
- Investigate parameters nbins_top_level
- API test, save-load test,
- Benchmark of training time compared to DRF
- Benchmark of prediction performance to DRF, Decision Tree (/)
- Sphinx documentation
- Enhance code documentation (Python examples, R examples…)
Other
- Support sample_rate_per_class, col_sample_rate_change_per_level, checkpoint,
Wendy Wong commented: For this JIRA, the focus is on:
- binary classification trees only
- numerical predictors only
- python client support
- R client support
For performance, please also add R single decision tree to comparison. Wendy Wong can assist in this effort.
JIRA Issue Details
Jira Issue: PUBDEV-8691 Assignee: Yuliia Syzon Reporter: Adam Valenta State: In Progress Fix Version: 3.42.0.1 Attachments: N/A Development PRs: Available
Linked PRs from JIRA
https://github.com/h2oai/h2o-3/pull/6182 https://github.com/h2oai/h2o-3/pull/6447
Next steps:
- add NA support;
- add multinomial classification;
- add regression.