h2o-3 icon indicating copy to clipboard operation
h2o-3 copied to clipboard

Single Decision Tree

Open exalate-issue-sync[bot] opened this issue 1 year ago • 5 comments

Implement Standard Decision Tree.

Specific task for student Yuliia Syzon.

exalate-issue-sync[bot] avatar May 11 '23 15:05 exalate-issue-sync[bot]

Adam Valenta commented: Currently, we are focusing on binomial classification with UniformAdaptive histogram.

In progress:

  • Fix bug with different prediction on different machines
  • Benchmark training time and scalability agains DRF
  • Fix the binning to go through data only ones and not for each bin and each feature

Major stuff on roadmap:

  • Support categorical splits
  • Support NaNs splits
  • Support other types of histograms: ** UniformAdaptive (/) ** Random ** QuantilesGlobal ** RoundRobin
  • Support Tree printing
  • Support multinomial classification
  • Support regression
  • Support scoring in training process
  • Support grid search
  • Support weights
  • Support early stopping
  • Make findBestSplit parallel
  • Support cross validation
  • Support MOJO
  • Support Machine Learning Interpretability stuff: Shapley, PDP plot,…

Minor stuff to be done:

  • Restructuralize code to StandardDecisonTree class and try to guess interface for shallow tree
  • add parameters nbins, mtries, custom_metric, min_rows, (nbins_cats), min_split_improvement, categorical_encoding, check_constant_response
  • Investigate parameters nbins_top_level
  • API test, save-load test,
  • Benchmark of training time compared to DRF
  • Benchmark of prediction performance to DRF, Decision Tree (/)
  • Sphinx documentation
  • Enhance code documentation (Python examples, R examples…)

Other

  • Support sample_rate_per_class, col_sample_rate_change_per_level, checkpoint,

exalate-issue-sync[bot] avatar May 11 '23 15:05 exalate-issue-sync[bot]

Wendy Wong commented: For this JIRA, the focus is on:

  • binary classification trees only
  • numerical predictors only
  • python client support
  • R client support

For performance, please also add R single decision tree to comparison. Wendy Wong can assist in this effort.

exalate-issue-sync[bot] avatar May 11 '23 15:05 exalate-issue-sync[bot]

JIRA Issue Details

Jira Issue: PUBDEV-8691 Assignee: Yuliia Syzon Reporter: Adam Valenta State: In Progress Fix Version: 3.42.0.1 Attachments: N/A Development PRs: Available

h2o-ops avatar May 14 '23 18:05 h2o-ops

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/6182 https://github.com/h2oai/h2o-3/pull/6447

h2o-ops avatar May 14 '23 18:05 h2o-ops

Next steps:

  1. add NA support;
  2. add multinomial classification;
  3. add regression.

wendycwong avatar Jun 03 '24 13:06 wendycwong