forest-confidence-interval icon indicating copy to clipboard operation
forest-confidence-interval copied to clipboard

V_IJ_unbiased is zero, when x_test = 1

Open Maxfashko opened this issue 6 years ago • 3 comments

Hi. I am trying to get a CI for a new x value that was not in the training set.

X_train.shape = 2000, 1
X_test.shape = 1, 1

X_test = [10]

V_IJ_unbiased = fci.random_forest_error (model, X_train, X_test)

At the same time, V_IJ_unbiased = [0.] But if I use X_test.shape = [2, 1] - everything is fine. example X_test = [10, 23] How can I get the CI for the new value of X_test ?

Maxfashko avatar Dec 28 '18 08:12 Maxfashko

I think the issue is a bug on line 239 of forestci.py. I think this line should be replaced by

pred_mean = np.mean(pred, 1) pred_centered = (pred.T - pred_mean).T

because we want to average over the bootstrap dimension, not the test dimension.

agrawalraj avatar Feb 03 '20 00:02 agrawalraj

@agrawalraj I have also faced this issue some time ago.

The solution I found back then is similar to yours. While I was debugging the function from which you were copying lines I found out: 'pred' had the following dimensions: 0: the samples 1: the prediction for each tree It is the result of this line: pred = np.array([tree.predict(X_test) for tree in forest]).T

Due to the dimension of the prediction array for one sample the mean calculation might not return the result that is expected. Either way, it does not make sense to average the prediction of different samples for the same tree instead of averaging the predictions of all trees of the forest for one sample.

This did the fix for me: pred_mean = np.mean(pred, 1).reshape(X_test.shape[0], 1) Nothing else had to be changed.

Maybe it would be benefitial to include the single (test) sample case (as in LOOCV) in the code tests.

cewaphi avatar Mar 25 '20 21:03 cewaphi

We'd welcome a pull request

arokem avatar Mar 26 '20 01:03 arokem