If you train a grid, the model parameters (attributes of the {{H2OGradientBoostingEstimator}} class) are not being set.

Example: {code} import h2o h2o.init()

csv_url = "https://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv" prostate = h2o.import_file(csv_url)

prostate['CAPSULE'] = prostate['CAPSULE'].asfactor() prostate['RACE'] = prostate['RACE'].asfactor() prostate['DCAPS'] = prostate['DCAPS'].asfactor() prostate['DPROS'] = prostate['DPROS'].asfactor() x = range(2,9) y = 1 {code}

Train a non-grid model: {code}

Import H2O GBM:

from h2o.estimators.gbm import H2OGradientBoostingEstimator

model = H2OGradientBoostingEstimator(distribution='bernoulli', ntrees=100, max_depth=4, learn_rate=0.1, nfolds=5, keep_cross_validation_predictions=True)

model.train(x=x, y=y, training_frame=prostate)

model.nfolds #this is 5 {code}

However, if we train models via grid search, the model params are blank: {code} ntrees_opt = [5,50,100] max_depth_opt = [2,3,5] learn_rate_opt = [0.1,0.2]

hyper_params = {'ntrees': ntrees_opt, 'max_depth': max_depth_opt, 'learn_rate': learn_rate_opt}

from h2o.grid.grid_search import H2OGridSearch from h2o.estimators.gbm import H2OGradientBoostingEstimator

gbm_grid = H2OGridSearch(H2OGradientBoostingEstimator(nfolds=5, keep_cross_validation_predictions=True), hyper_params = hyper_params)

gbm_grid.train(x=x, y=y, training_frame=prostate)

gbm_grid[0].nfolds ## this is currently blank, and should be 5 {code} This happens for all the model parameters... they are all blank on the models that were trained using grid.

May 13 '23 18:05 exalate-issue-sync[bot]

Lauren DiPerna commented: currently if you want to extract a model (for example the first model) attribute from grid search you have to do:

{code} print(sorted_grid[0].params['lambda']['actual'][0])

{code} but you should be able to do the following (which you can do in R) {code} model_1 = h2o.get_model("Grid_GLM_py_4_sid_9dd9_model_python_1481244882436_2_model_4")

model_1.show()

model_1.alpha {code}

here is another snippet to run and see the issues:

{code} import h2o from h2o.estimators.glm import H2OGeneralizedLinearEstimator h2o.init()

import the boston dataset:

this dataset looks at features of the boston suburbs and predicts median housing prices

the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Housing

boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv")

set the predictor names and the response column name

predictors = boston.columns[:-1]

set the response column to "medv", the median value of owner-occupied homes in $1000's

response = "medv"

convert the chas column to a factor (chas = Charles River dummy variable (= 1 if tract bounds river; 0 otherwise))

boston['chas'] = boston['chas'].asfactor()

split into train and validation sets

train, valid = boston.split_frame(ratios = [.8], seed = 1234)

try using the `alpha` parameter:

initialize the estimator then train the model

boston_glm = H2OGeneralizedLinearEstimator(alpha = .25, seed = 1234) boston_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

print the mse for validation set

print(boston_glm.mse(valid=True))

grid over `alpha`

import Grid Search

from h2o.grid.grid_search import H2OGridSearch

select the values for `alpha` to grid over

hyper_params = {'alpha': [0, .25, .5, .75, .1]}

this example uses cartesian grid search because the search space is small

and we want to see the performance of all models. For a larger search space use

random grid search instead: {'strategy': "RandomDiscrete"}

initialize the GLM estimator

boston_glm_2 = H2OGeneralizedLinearEstimator(nfolds = 5, seed = 1234)

build grid search with previously made GLM and hyperparameters

grid = H2OGridSearch(model = boston_glm_2, hyper_params = hyper_params, search_criteria = {'strategy': "Cartesian"})

train using the grid

grid.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

grid.train(x = predictors, y = response, training_frame = train)

sort the grid models by mse

sorted_grid = grid.get_grid(sort_by='mse', decreasing=False) print(sorted_grid)

#this works print(boston_glm.alpha)

#get the type print(type(sorted_grid))

can you get the model summary from a model extracted from the grid search?

sorted_grid[0] # answer: yes

sorted_grid[0].params['lambda']['actual'][0] # this is the only way to get the attributes {code}

May 13 '23 18:05 exalate-issue-sync[bot]

Lauren DiPerna commented: this is not an issue in R using the following code {code} library(h2o) h2o.init()

import the boston dataset:

this dataset looks at features of the boston suburbs and predicts median housing prices

the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Housing

boston <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv")

set the predictor names and the response column name

predictors <- colnames(boston)[1:13]

set the response column to "medv", the median value of owner-occupied homes in $1000's

response <- "medv"

convert the chas column to a factor (chas = Charles River dummy variable (= 1 if tract bounds river; 0 otherwise))

boston["chas"] <- as.factor(boston["chas"])

split into train and validation sets

boston.splits <- h2o.splitFrame(data = boston, ratios = .8, seed = 1234) train <- boston.splits[[1]] valid <- boston.splits[[2]]

try using the `alpha` parameter:

train your model, where you specify the alpha

boston_glm <- h2o.glm(x = predictors, y = response, training_frame = train, validation_frame = valid, alpha = .25, seed = 1234)

print the mse for validation set

print(h2o.mse(boston_glm, valid=TRUE))

grid over `alpha`

select the values for `alpha` to grid over

hyper_params <- list( alpha = c(0, .25, .5, .75, .1) )

this example uses cartesian grid search because the search space is small

and we want to see the performance of all models. For a larger search space use

random grid search instead: {'strategy': "RandomDiscrete"}

build grid search with previously made GLM and hyperparameters

grid <- h2o.grid(x = predictors, y = response, training_frame = train, validation_frame = valid, algorithm = "glm", grid_id = "boston_grid", hyper_params = hyper_params, search_criteria = list(strategy = "Cartesian"), seed = 1234)

Sort the grid models by mse

sortedGrid <- h2o.getGrid("boston_grid", sort_by = "mse", decreasing = FALSE) sortedGrid {code}

you can do

{code} sortedGrid@model_ids model_1 = h2o.getModel("boston_grid_model_4") model_1@allparameters$prior {code}

if you do the same thing in python it doesn't work

May 13 '23 18:05 exalate-issue-sync[bot]

JIRA Issue Migration Info

Jira Issue: PUBDEV-2465 Assignee: New H2O Bugs Reporter: Erin LeDell State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A

May 15 '23 10:05 DinukaH2O

h2o-3 h2o-3 copied to clipboard

Model attributes not being populated in Python grid search models

Import H2O GBM:

model_1.show()

import the boston dataset:

this dataset looks at features of the boston suburbs and predicts median housing prices

the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Housing

set the predictor names and the response column name

set the response column to "medv", the median value of owner-occupied homes in $1000's

convert the chas column to a factor (chas = Charles River dummy variable (= 1 if tract bounds river; 0 otherwise))

split into train and validation sets

try using the alpha parameter:

initialize the estimator then train the model

print the mse for validation set

grid over alpha

import Grid Search

select the values for alpha to grid over

this example uses cartesian grid search because the search space is small

and we want to see the performance of all models. For a larger search space use

random grid search instead: {'strategy': "RandomDiscrete"}

initialize the GLM estimator

build grid search with previously made GLM and hyperparameters

train using the grid

grid.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

sort the grid models by mse

can you get the model summary from a model extracted from the grid search?

import the boston dataset:

this dataset looks at features of the boston suburbs and predicts median housing prices

the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Housing

set the predictor names and the response column name

set the response column to "medv", the median value of owner-occupied homes in $1000's

convert the chas column to a factor (chas = Charles River dummy variable (= 1 if tract bounds river; 0 otherwise))

split into train and validation sets

try using the alpha parameter:

train your model, where you specify the alpha

print the mse for validation set

grid over alpha

select the values for alpha to grid over

this example uses cartesian grid search because the search space is small

and we want to see the performance of all models. For a larger search space use

random grid search instead: {'strategy': "RandomDiscrete"}

build grid search with previously made GLM and hyperparameters

Sort the grid models by mse

h2o-3
h2o-3 copied to clipboard

try using the `alpha` parameter:

grid over `alpha`

select the values for `alpha` to grid over

try using the `alpha` parameter:

grid over `alpha`

select the values for `alpha` to grid over