Getting different result when train a model again with Best selected hyperprameter
Hi Hi,
I am using hyperopt.fmin to find the best hyperparameter. I've used PCA for feature selection and SVM for classification and pipeline them :
def getModelInstance(self):
model = pipeline.Pipeline([
('scaler', preprocessing.StandardScaler()),
('reducer', decomposition.PCA(random_state=1)),
('estimator', SVC( kernel='linear', class_weight='balanced', probability=True))
])
return model
then I did cross-validation with 3-fold :
train_loss = model_selection.cross_val_score(model, x_train, y_train,
cv=3, scoring= 'neg_log_loss')
and then I apply hyperopt-fmin to find the best hyperparameters (n_components for PCA and C for SVM) which have a minimum neg_log_loss.
But I have a question here which is when I get my best C and n_components and then I set them to the model and train it again, the neg_log_loss is not the same as I got before (in hyperopt.fmin result). for example:
I got my best hyperparameter:
n_components= 41 C= 0.485879250931 log_loss= 0.581865148897
I set the parameter here:
model = pipeline.Pipeline([
('scaler', preprocessing.StandardScaler()),
('reducer', decomposition.PCA(n_components= 41,random_state=1)),
('estimator', SVC( C=0.485879250931 , kernel='linear', class_weight='balanced', probability=True))
])
return model
and do 3-fold cross-validation but I get the different neg_log_loss and don't know why?
Looks like there could be some differences when you shuffle the data in the SVC? set the random_state parameter for SVC = 1 as well in both the models , i think it will be the same neg log loss then
This issue has been marked as stale because it has been open 120 days with no activity. Remove the stale label or comment or this will be closed in 30 days.