forest-confidence-interval
forest-confidence-interval copied to clipboard
V_IJ_unbiased is zero, when x_test = 1
Hi. I am trying to get a CI for a new x value that was not in the training set.
X_train.shape = 2000, 1
X_test.shape = 1, 1
X_test = [10]
V_IJ_unbiased = fci.random_forest_error (model, X_train, X_test)
At the same time, V_IJ_unbiased = [0.]
But if I use X_test.shape = [2, 1] - everything is fine.
example
X_test = [10, 23]
How can I get the CI for the new value of X_test ?
I think the issue is a bug on line 239 of forestci.py. I think this line should be replaced by
pred_mean = np.mean(pred, 1) pred_centered = (pred.T - pred_mean).T
because we want to average over the bootstrap dimension, not the test dimension.
@agrawalraj I have also faced this issue some time ago.
The solution I found back then is similar to yours. While I was debugging the function from which you were copying lines I found out: 'pred' had the following dimensions: 0: the samples 1: the prediction for each tree It is the result of this line: pred = np.array([tree.predict(X_test) for tree in forest]).T
Due to the dimension of the prediction array for one sample the mean calculation might not return the result that is expected. Either way, it does not make sense to average the prediction of different samples for the same tree instead of averaging the predictions of all trees of the forest for one sample.
This did the fix for me: pred_mean = np.mean(pred, 1).reshape(X_test.shape[0], 1) Nothing else had to be changed.
Maybe it would be benefitial to include the single (test) sample case (as in LOOCV) in the code tests.
We'd welcome a pull request