dbarts
dbarts copied to clipboard
dbarts predicted value less than the sum of the minimum terminal node values
I attach a dbarts object and test data matrix to this message. dbarts_debug_example.zip
The predicted value from dbarts is ``-0.977252'', which is less than the sum of the minimum terminal node values obtained from each tree.
load(file = "......./debugsampler.RData" )
load(file = "....../debugtempbind.RData" )
n.trees <- [email protected]
tempsum <- 0
for(i in 1:n.trees){
treeexample1 <- sampler$getTrees(treeNums = i,
chainNums = 1,
sampleNums = 1)
tempsum <- tempsum + min(treeexample1[treeexample1$var== -1, ]$value)
}
tempsum
I think the prediction should be 0.1020762
getPredictionsForTree <- function(tree, x) {
predictions <- rep(NA_real_, nrow(x))
getPredictionsForTreeRecursive <- function(tree, indices) {
if (tree$var[1] == -1) {
# Assigns in the calling environment by using <<-
predictions[indices] <<- tree$value[1]
return(1)
}
goesLeft <- x[indices, tree$var[1]] <= tree$value[1]
headOfLeftBranch <- tree[-1,]
n_nodes.left <- getPredictionsForTreeRecursive(
headOfLeftBranch, indices[goesLeft])
headOfRightBranch <- tree[seq.int(2 + n_nodes.left, nrow(tree)),]
n_nodes.right <- getPredictionsForTreeRecursive(
headOfRightBranch, indices[!goesLeft])
return(1 + n_nodes.left + n_nodes.right)
}
getPredictionsForTreeRecursive(tree, seq_len(nrow(x)))
return(predictions)
}
getPredictionsForTree(treeexample1,as.matrix(rep(tempbind[1,1],100),100,1) )
n.trees <- [email protected]
tempsum <- 0
for(i in 1:n.trees){
treeexample1 <- sampler$getTrees(treeNums = i,
chainNums = 1,
sampleNums = 1)
tempsum <- tempsum + getPredictionsForTree(treeexample1,as.matrix(rep(tempbind[1,1],100),100,1) )
}
tempsum
Also the training data predictions obtained from $predict()
are different to those obtained from getPredictionsForTree( )
predict( )
gives the same predictions as $train
after I run the sampler. Therefore I assume the issue is that the tree predictions must be rescaled. How can I obtain the values for the scaling transformation?
sampler$predict( sampler$data@x )
getPredictionsForTree <- function(tree, x) {
predictions <- rep(NA_real_, nrow(x))
getPredictionsForTreeRecursive <- function(tree, indices) {
if (tree$var[1] == -1) {
# Assigns in the calling environment by using <<-
predictions[indices] <<- tree$value[1]
return(1)
}
goesLeft <- x[indices, tree$var[1]] <= tree$value[1]
headOfLeftBranch <- tree[-1,]
n_nodes.left <- getPredictionsForTreeRecursive(
headOfLeftBranch, indices[goesLeft])
headOfRightBranch <- tree[seq.int(2 + n_nodes.left, nrow(tree)),]
n_nodes.right <- getPredictionsForTreeRecursive(
headOfRightBranch, indices[!goesLeft])
return(1 + n_nodes.left + n_nodes.right)
}
getPredictionsForTreeRecursive(tree, seq_len(nrow(x)))
return(predictions)
}
n.trees <- [email protected]
predvec <- rep(NA, nrow(sampler$data@x) )
for(i in 1: nrow(sampler$data@x)){
tempsum <- 0
for(j in 1: n.trees){
treeexample1 <- sampler$getTrees(treeNums = j,
chainNums = 1,
sampleNums = 1)
tempsum <- tempsum + getPredictionsForTree(treeexample1, as.matrix(sampler$data@x[i,]) )
}
predvec[i] <- tempsum
}
predvec
I think the tree terminal node values are not on the original scale, so it is necessary to transform back from the scale between -0.5 and 0.5 to the original scale.
Is there an option for displaying terminal node values on the original scale?
I think this should be explained in the documentation. I did not see this explained in the vignette on working with dbarts saved tree objects.
tempmax <- max(sampler$data@y)
tempmin <- min(sampler$data@y)
(predvec + 0.5)*(tempmax - tempmin) +tempmin