randomForestCI prediction and variance for new data

prediction and variance for new data

Open MNRiverEcologyUnit opened this issue 8 years ago • 4 comments

I would like to obtain a variance estimate for a new observation (randomForestInfJack). An observation that was not in X, when creating the random forest (randomForest(X,Y,keep.inbag=T)). It is not clear to me if this should 1) be done, or if it is valid should the new observation be added to the original data set X and then run randomForestInfJack()? Dan O.

Jan 19 '17 15:01 MNRiverEcologyUnit

Here's an example with confidence intervals on new observations:

# Make some data...
n = 250
p = 100
X = matrix(rnorm(n * p), n, p)
Y = rnorm(n)

#  Run the method
rf = randomForest(X, Y, ntree = 2000, keep.inbag = TRUE)

n.test = 100
X.test = matrix(rnorm(n.test * p), n.test, p)
ij = randomForestInfJack(rf, X.test, calibrate = TRUE)

Jan 19 '17 15:01 swager

Thanks for the quick response. What if I wanted to predict Y and it's variance for one new data point where I have measured the set of X's but not the Y. Since I only have one new observation and you can't use just one row of data in the function would it be best to append/rbind the new observation to all of the original X?

Jan 19 '17 16:01 MNRiverEcologyUnit

Hmm yeah ideally the code would let you specify just one prediction point; that looks like a bug we should fix.

In the mean time, yes, appending it to the original X should get the job done.

Jan 19 '17 17:01 swager

@dtoalm do you have the latest version of the randomForestCI package? If I understood your question correctly, it was already fixed. A couple of weeks ago I have made an adjustment to allow one-row predictions and it was merged to the master branch. Changing the example by @swager to test data consisting of one row works (except for calibration):

n.test = 1
X.test = matrix(rnorm(n.test * p), n.test, p)
ij = randomForestInfJack(rf, X.test, calibrate = TRUE)

Jan 21 '17 11:01 alionaBER

randomForestCI randomForestCI copied to clipboard

prediction and variance for new data

randomForestCI
randomForestCI copied to clipboard