randomForestCI
randomForestCI copied to clipboard
prediction and variance for new data
I would like to obtain a variance estimate for a new observation (randomForestInfJack). An observation that was not in X, when creating the random forest (randomForest(X,Y,keep.inbag=T)). It is not clear to me if this should 1) be done, or if it is valid should the new observation be added to the original data set X and then run randomForestInfJack()? Dan O.
Here's an example with confidence intervals on new observations:
# Make some data...
n = 250
p = 100
X = matrix(rnorm(n * p), n, p)
Y = rnorm(n)
# Run the method
rf = randomForest(X, Y, ntree = 2000, keep.inbag = TRUE)
n.test = 100
X.test = matrix(rnorm(n.test * p), n.test, p)
ij = randomForestInfJack(rf, X.test, calibrate = TRUE)
Thanks for the quick response. What if I wanted to predict Y and it's variance for one new data point where I have measured the set of X's but not the Y. Since I only have one new observation and you can't use just one row of data in the function would it be best to append/rbind the new observation to all of the original X?
Hmm yeah ideally the code would let you specify just one prediction point; that looks like a bug we should fix.
In the mean time, yes, appending it to the original X should get the job done.
@dtoalm do you have the latest version of the randomForestCI package? If I understood your question correctly, it was already fixed. A couple of weeks ago I have made an adjustment to allow one-row predictions and it was merged to the master branch. Changing the example by @swager to test data consisting of one row works (except for calibration):
n.test = 1
X.test = matrix(rnorm(n.test * p), n.test, p)
ij = randomForestInfJack(rf, X.test, calibrate = TRUE)