forest-confidence-interval icon indicating copy to clipboard operation
forest-confidence-interval copied to clipboard

confidence interval on train data

Open BSharmi opened this issue 7 years ago • 2 comments

Hello,

I am trying to compare the RF error on train and test data for a regression problem. After fitting the model (rf_model), I estimate the error as follows (np for numpy)-

test_std = np.sqrt(fci.random_forest_error(rf_model, train_dataset.X, test_dataset.X)) train_std = np.sqrt(fci.random_forest_error(rf_model, train_dataset.X, train_dataset.X))

But from the results I see that the standard deviation for test data is very small compared to the standard deviation for the train dataset which is weird. I am thinking is it because I am passing the same dataset to calculate the standard deviation for training dataset?

I would really appreciate some help.

Thank you!

BSharmi avatar Jul 24 '18 00:07 BSharmi

Hello,

Good question, i was wondering the same thing. Some help would be appreciated from my side as well.

Thank you

bngksgl avatar Feb 27 '19 14:02 bngksgl

@BSharmi can you please share the full dataset and training of your case? Did you try using a large number of n_estimators to avoid calibration? This could be an artifact of the calibration (which is quite fragile IMHO). Is the result you obtain (average stdev on test smaller than on train) consistent for different train/test splits or it depends on the split?

danieleongari avatar Sep 13 '22 08:09 danieleongari