handson-ml2 [QUESTION] Chapter 2 - Using GridSearch result

I'm tripping up on a few things at the end of chapter 2... at the moment I'm trying to get my head around the GridSearch result.

Referring to the Jupyter notebook...

Line 94 forest_reg = RandomForestRegressor(n_estimators=100, random_state=42) n_estimators = 100 which gives a result of 50,182.

Later in the text we use Grid Search to automatically change hyperparameters. Line 99 {'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]}, The best result is 49,682 (max_features: 8, n_estimators: 30)

If you change n_estimators on line 94 to 30 you get 50,696. It makes sense to me that its bigger than the default in line 94 but why is the result from Grid Search smaller?

I've tried changing line 99 to {'n_estimators': [30, 100], 'max_features': [8]}, and this gives the results 49682.273345071546 {'max_features': 8, 'n_estimators': 30} 49219.71678391268 {'max_features': 8, 'n_estimators': 100}

My question is, what do you do with the result of the Grid Search? Why doesn't n_estimators give the same result both times? How does max_features fit in with regard to line 94?

Thank you!

Sep 24 '21 10:09 koxt2

Hi @koxt2 ,

Thanks for your question.

The RandomForestRegressor on line 94 doesn't specify max_features, so it uses the default which is to use the total number of features in the training set. In this case, housing_prepared has 16 columns, so max_features is 16. If you explicitly set max_features=8 on line 94, then line 96 will output a score much closer to the one on line 99.

But there's also a second difference: line 96 uses cv=10 while line 99 uses cv=5. If you change line 96 to use cv=5, you'll get the exact same result.

Hope this helps!

Oct 06 '21 21:10 ageron

Hi @koxt2,

Grid-search is used to find the optimal hyperparameters of a model which results in the most ‘accurate’ predictions. with in the given range of values for a parameter it will find the best combinations

Feb 03 '22 05:02 BHariKrishnaReddy