isds2020 icon indicating copy to clipboard operation
isds2020 copied to clipboard

Inverted validation curve

Open aabk-bkaa opened this issue 5 years ago • 1 comments
trafficstars

After fitting our model it appears that our validation curve is inverted:

image

The validation RMSE is systematically lower than the training RMSE which does not make intuitive sense to us.

The modelling was produced with the following code:

` X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=1)

lambdas = np.logspace(0, 8, 12)

folds = KFold(n_splits = 5) MSE_list =[]

for _lambda in tqdm(lambdas): pipe_preproc = make_pipeline(PolynomialFeatures(2),StandardScaler(), Lasso(alpha = _lambda, max_iter = 1000)) MSE_train = [] MSE_list_intermediate = []

for train_index, val_index in tqdm(folds.split(X_train,y_train)):
    
    X_tr, y_tr = X_train.iloc[train_index], y_train.iloc[train_index]
    X_val, y_val = X_train.iloc[val_index], y_train.iloc[val_index]

    MSE_list_intermediate.append(mse(y_val,pipe_preproc.fit(X_tr,y_tr).predict(X_val))**(1/2))
    
    MSE_train.append(mse(y_train,pipe_preproc.fit(X_tr,y_tr).predict(X_train))**(1/2))

MSE_list.append([_lambda] + MSE_list_intermediate + [np.mean(MSE_list_intermediate)] + [np.mean(MSE_train)])

MSE = pd.DataFrame(MSE_list) MSE.columns = ["Lambda", "Fold 1", "Fold 2","Fold 3","Fold 4","Fold 5","Mean_RMSE", "Mean_RMSE_Evaluation"]

MSE.to_excel("LASSO_output.xlsx") `

Can anybody help us.

Kind regards Anton and Søren

aabk-bkaa avatar Aug 25 '20 08:08 aabk-bkaa

hi @aabk-bkaa, assuming that you did not plot the data and label the curves incorrectly, there could be other reasons for the RMSE being lower on the validation data than on the training data. See: https://stats.stackexchange.com/questions/187335/validation-error-less-than-training-error

jsr-p avatar Aug 25 '20 09:08 jsr-p