h2o-3 icon indicating copy to clipboard operation
h2o-3 copied to clipboard

PUBDEV-8461: Standardizing and improving algorithm parameter sections

Open hannah-tillman opened this issue 2 years ago • 4 comments

For: PUBDEV-8461 & PUBDEV-8049

There are three goals within this PR:

  1. Restructure the parameters
  2. Expand and clarify the values of parameters
  3. Standardize the style

Restructuring: I separated the parameters into common and hyperparameters. The order now follows more closely the R documentation order of importance. I lead with all required params. I went through the schemas to find gridable=True for params that don't have a page in the appendix to figure out whether they were hyperparams or not. Please let me know if any of these are incorrect.

Expanding: I compared the information in the Python & R docs to what was in the user guide to expand on some params that were lacking info. I also got some outside help :)

Standardizing: Because so many different hands have written these parameter lists, there was not a lot of cohesion on their style. I started to standardize it here (e.g. made all "input-able" values code backticks instead of bolded or bare and created vertical lists when four or more objects were listed in a row).

I would appreciate any and all input. I would especially appreciate algorithm owners double-checking to make sure I got the hyperparameters correct. Please let me know if you have any questions or critiques!

I've included a screenshot of what Aggregator looks like when built to get a feel of what the initial idea looks like: Screen Shot 2022-07-15 at 12 10 12 PM

note: I'm not including Infogram, AutoML, Model Explainability, or miscellaneous algos in this PR since they're structured a little differently. After this initial batch gets solidified, I will make a new PR for changes in those if & where needed.

hannah-tillman avatar Jul 15 '22 17:07 hannah-tillman

Extended Isolation Forest ✅

valenad1 avatar Sep 15 '22 17:09 valenad1

I have mixed feelings about the parameter sections. No doubt that we need to put an order in it 💯

When I am thinking about the Common parameters and Hyperparameters sections in general, the way I understand those sections is that Common parameters should contain only the common parameters through all available algos. Is it correct? Without a doubt common parameters are model_id, x, training_frame, y (If algo is supervised),... what bothers me are parameters like contamination (Isolation Forest), pca_method (PCA),... because they are to say the least algo specific and IMHO belong to hyperparameter section (or some Algo Specific hyperparameters section).

The second thought I have is that we already have this page:

https://docs.h2o.ai/h2o/latest-stable/h2o-docs/grid-search.html#supported-grid-search-hyperparameters

about the supported grid search hyperparameters. I would rather see the page updated (E.g. I forgot to add Extended Isolation Forest section there)and the information included in algos pages rather than copy this information under Hyperparameter. Because hyperparameter is not the same as "gridable" parameter. And not all algos support Grid search (E.g., Uplift RF)

I would suggest removing These parameters can be used in grid search. and rather update and segment grid search page to by able to include supported grid search parameters inside algo documentation pages. We can have the information twice but we need to write it only on one place.

What do you think?

valenad1 avatar Sep 15 '22 17:09 valenad1

Adam:

I agree with you that we need to put in some effort to help Hannah finish her work. Having a common parameters section for all algos is great. Then, for each algo, we list out the algo specific parameters.

Regarding gridsearch, my first confusion is that there is no consistency. Some parameters are gridable in one algo but not gridable in another. I think gridsearch parameter can also be divided into two sections: paramaters common to all algos, then algorithm specific parameters.

The problem here is we need to help Hannah to figure this out. I know all of us are busy but somehow this needs to be done.

W

wendycwong avatar Sep 16 '22 14:09 wendycwong

agreed @wendycwong - let's make this a priority after the 3.38 release

michalkurka avatar Sep 16 '22 18:09 michalkurka