h2o-3
h2o-3 copied to clipboard
PUBDEV-8461: Standardizing and improving algorithm parameter sections
For: PUBDEV-8461 & PUBDEV-8049
There are three goals within this PR:
- Restructure the parameters
- Expand and clarify the values of parameters
- Standardize the style
Restructuring: I separated the parameters into common and hyperparameters. The order now follows more closely the R documentation order of importance. I lead with all required params. I went through the schemas to find gridable=True
for params that don't have a page in the appendix to figure out whether they were hyperparams or not. Please let me know if any of these are incorrect.
Expanding: I compared the information in the Python & R docs to what was in the user guide to expand on some params that were lacking info. I also got some outside help :)
Standardizing: Because so many different hands have written these parameter lists, there was not a lot of cohesion on their style. I started to standardize it here (e.g. made all "input-able" values code backticks instead of bolded or bare and created vertical lists when four or more objects were listed in a row).
I would appreciate any and all input. I would especially appreciate algorithm owners double-checking to make sure I got the hyperparameters correct. Please let me know if you have any questions or critiques!
I've included a screenshot of what Aggregator looks like when built to get a feel of what the initial idea looks like:
note: I'm not including Infogram, AutoML, Model Explainability, or miscellaneous algos in this PR since they're structured a little differently. After this initial batch gets solidified, I will make a new PR for changes in those if & where needed.
Extended Isolation Forest ✅
I have mixed feelings about the parameter sections. No doubt that we need to put an order in it 💯
When I am thinking about the Common parameters and Hyperparameters sections in general, the way I understand those sections is that Common parameters should contain only the common parameters through all available algos. Is it correct? Without a doubt common parameters are model_id, x, training_frame, y (If algo is supervised),... what bothers me are parameters like contamination (Isolation Forest), pca_method (PCA),... because they are to say the least algo specific and IMHO belong to hyperparameter section (or some Algo Specific hyperparameters section).
The second thought I have is that we already have this page:
https://docs.h2o.ai/h2o/latest-stable/h2o-docs/grid-search.html#supported-grid-search-hyperparameters
about the supported grid search hyperparameters. I would rather see the page updated (E.g. I forgot to add Extended Isolation Forest section there)and the information included in algos pages rather than copy this information under Hyperparameter. Because hyperparameter is not the same as "gridable" parameter. And not all algos support Grid search (E.g., Uplift RF)
I would suggest removing These parameters can be used in grid search.
and rather update and segment grid search page to by able to include supported grid search parameters inside algo documentation pages. We can have the information twice but we need to write it only on one place.
What do you think?
Adam:
I agree with you that we need to put in some effort to help Hannah finish her work. Having a common parameters section for all algos is great. Then, for each algo, we list out the algo specific parameters.
Regarding gridsearch, my first confusion is that there is no consistency. Some parameters are gridable in one algo but not gridable in another. I think gridsearch parameter can also be divided into two sections: paramaters common to all algos, then algorithm specific parameters.
The problem here is we need to help Hannah to figure this out. I know all of us are busy but somehow this needs to be done.
W
agreed @wendycwong - let's make this a priority after the 3.38 release