dbarts icon indicating copy to clipboard operation
dbarts copied to clipboard

dbarts predicted value less than the sum of the minimum terminal node values

Open EoghanONeill opened this issue 1 year ago • 2 comments

I attach a dbarts object and test data matrix to this message. dbarts_debug_example.zip

The predicted value from dbarts is ``-0.977252'', which is less than the sum of the minimum terminal node values obtained from each tree.

load(file = "......./debugsampler.RData" )

load(file = "....../debugtempbind.RData" )

n.trees <- [email protected]

tempsum <- 0

for(i in 1:n.trees){

  treeexample1 <- sampler$getTrees(treeNums = i,
                                   chainNums = 1,
                                   sampleNums = 1)
  
  
  tempsum <- tempsum + min(treeexample1[treeexample1$var== -1, ]$value)
  
}

tempsum

I think the prediction should be 0.1020762




getPredictionsForTree <- function(tree, x) {
  predictions <- rep(NA_real_, nrow(x))
  getPredictionsForTreeRecursive <- function(tree, indices) {
    if (tree$var[1] == -1) {
      # Assigns in the calling environment by using <<-
      predictions[indices] <<- tree$value[1]
      return(1)
    }
    goesLeft <- x[indices, tree$var[1]] <= tree$value[1]
    headOfLeftBranch <- tree[-1,]
    n_nodes.left <- getPredictionsForTreeRecursive(
      headOfLeftBranch, indices[goesLeft])
    headOfRightBranch <- tree[seq.int(2 + n_nodes.left, nrow(tree)),]
    n_nodes.right <- getPredictionsForTreeRecursive(
      headOfRightBranch, indices[!goesLeft])
    return(1 + n_nodes.left + n_nodes.right)
  }

  getPredictionsForTreeRecursive(tree, seq_len(nrow(x)))
  return(predictions)
}


getPredictionsForTree(treeexample1,as.matrix(rep(tempbind[1,1],100),100,1) )



n.trees <- [email protected]

tempsum <- 0

for(i in 1:n.trees){
  
  treeexample1 <- sampler$getTrees(treeNums = i,
                                   chainNums = 1,
                                   sampleNums = 1)
  
  
  tempsum <- tempsum + getPredictionsForTree(treeexample1,as.matrix(rep(tempbind[1,1],100),100,1) )
  
}

tempsum

EoghanONeill avatar Oct 04 '23 07:10 EoghanONeill

Also the training data predictions obtained from $predict() are different to those obtained from getPredictionsForTree( )

predict( ) gives the same predictions as $train after I run the sampler. Therefore I assume the issue is that the tree predictions must be rescaled. How can I obtain the values for the scaling transformation?


sampler$predict( sampler$data@x )



getPredictionsForTree <- function(tree, x) {
  predictions <- rep(NA_real_, nrow(x))
  getPredictionsForTreeRecursive <- function(tree, indices) {
    if (tree$var[1] == -1) {
      # Assigns in the calling environment by using <<-
      predictions[indices] <<- tree$value[1]
      return(1)
    }
    goesLeft <- x[indices, tree$var[1]] <= tree$value[1]
    headOfLeftBranch <- tree[-1,]
    n_nodes.left <- getPredictionsForTreeRecursive(
      headOfLeftBranch, indices[goesLeft])
    headOfRightBranch <- tree[seq.int(2 + n_nodes.left, nrow(tree)),]
    n_nodes.right <- getPredictionsForTreeRecursive(
      headOfRightBranch, indices[!goesLeft])
    return(1 + n_nodes.left + n_nodes.right)
  }

  getPredictionsForTreeRecursive(tree, seq_len(nrow(x)))
  return(predictions)
}


n.trees <- [email protected]

predvec <- rep(NA, nrow(sampler$data@x) )

for(i in 1: nrow(sampler$data@x)){
  
  tempsum <- 0
  for(j in 1: n.trees){
    
    treeexample1 <- sampler$getTrees(treeNums = j,
                                     chainNums = 1,
                                     sampleNums = 1)
    
    tempsum <- tempsum + getPredictionsForTree(treeexample1, as.matrix(sampler$data@x[i,]) )
    
  }
  predvec[i] <- tempsum
  
}

predvec

EoghanONeill avatar Oct 04 '23 09:10 EoghanONeill

I think the tree terminal node values are not on the original scale, so it is necessary to transform back from the scale between -0.5 and 0.5 to the original scale.

Is there an option for displaying terminal node values on the original scale?

I think this should be explained in the documentation. I did not see this explained in the vignette on working with dbarts saved tree objects.

tempmax <- max(sampler$data@y)
tempmin <- min(sampler$data@y)
(predvec + 0.5)*(tempmax - tempmin) +tempmin

EoghanONeill avatar Oct 04 '23 10:10 EoghanONeill