forest-confidence-interval
forest-confidence-interval copied to clipboard
confidence interval on train data
Hello,
I am trying to compare the RF error on train and test data for a regression problem. After fitting the model (rf_model), I estimate the error as follows (np for numpy)-
test_std = np.sqrt(fci.random_forest_error(rf_model, train_dataset.X, test_dataset.X)) train_std = np.sqrt(fci.random_forest_error(rf_model, train_dataset.X, train_dataset.X))
But from the results I see that the standard deviation for test data is very small compared to the standard deviation for the train dataset which is weird. I am thinking is it because I am passing the same dataset to calculate the standard deviation for training dataset?
I would really appreciate some help.
Thank you!
Hello,
Good question, i was wondering the same thing. Some help would be appreciated from my side as well.
Thank you
@BSharmi can you please share the full dataset and training of your case?
Did you try using a large number of n_estimators to avoid calibration? This could be an artifact of the calibration (which is quite fragile IMHO).
Is the result you obtain (average stdev on test smaller than on train) consistent for different train/test splits or it depends on the split?