Finding the Value of a Mult-arm Policy Tree and Gain (HTE) of Each Node

Open shafayetShafee opened this issue 11 months ago • 2 comments

Hello, GRF labs team, thank you for building and maintaining ML tools for causal inference. I have recently started using the packages - {grf}, {policytree} and {maq} for a multi-arm treatment data. However, I need guidance on,

How can I measure the benefit of a multi-arm policy tree, that is, what is the expected uplift if I allocate the treatments as suggested by the policy tree?
How can I calculate the HTE (or uplift) for each suggested leaf of the multi-arm policytree?

Consider the following reprex from the {policytree} docs,

library(grf)
library(DiagrammeR)
library(policytree)

set.seed(1344)

n <- 5000 # number of subjects
p <- 10 # number of pre-treatment covariates
data <- gen_data_mapl(n, p)
head(data.frame(data)[1:6])

  action          Y        X.1        X.2          X.3        X.4
1      1  2.0905685 0.46864412 0.22784132 0.6269076453 0.71743214
2      1 -0.3635110 0.42637207 0.74486955 0.2168520605 0.25096641
3      2  1.1952524 0.70800587 0.03636718 0.0007285422 0.01940693
4      2  1.8925130 0.05426782 0.97814957 0.5647130643 0.45693730
5      2 -0.4199647 0.95205213 0.53340813 0.3522940478 0.44700733
6      0  2.6126626 0.69570475 0.77983017 0.9636860283 0.72605917

X <- data$X
Y <- data$Y
W <- as.factor(data$action)

multi.forest <- multi_arm_causal_forest(X, Y, W, seed = 1344)

Gamma.matrix <- double_robust_scores(multi.forest)
head(Gamma.matrix)

             0          1          2
[1,]  2.640066  2.1543402  1.2385549
[2,]  1.365158 -2.1944712  1.4777855
[3,] -1.276021  0.6792919  0.8427064
[4,]  2.511125  1.8532989  4.4776106
[5,]  2.708066  2.1123636 -6.4939299
[6,]  9.127422  1.8865874  1.4614259

train <- sample(1:n, 3500)
opt.tree <- policy_tree(X[train, ], Gamma.matrix[train, ], depth = 2)
opt.tree

policy_tree object 
Tree depth:  2 
Actions:  1: 0 2: 1 3: 2 
Variable splits: 
(1) split_variable: X5  split_value: 0.600704 
  (2) split_variable: X7  split_value: 0.334204 
    (4) * action: 3 
    (5) * action: 1 
  (3) split_variable: X7  split_value: 0.784358 
    (6) * action: 2 
    (7) * action: 3

Get the policy for the held out data,

X.test <- X[-train, ]
pi_hat <- predict(opt.tree, X.test)
head(pi_hat)

[1] 3 1 2 2 2 2

Determining the value of the Policy Tree

Following the eqn. (11) from this paper, value of the policy, $\widehat V(\pi) = \frac{1}{n} \sum_{i=1}^{n}\langle \hat \pi(X_i), \widehat \Gamma_i \rangle$ where the policy $\pi(X_i)$ is a $K$-dimensional vector where the $k^{th}$ element is equal to 1 if arm $k$ is assigned, and 0 otherwise and $\widehat \Gamma_i$ is the estimated DR score (eqn. 13).

# creating the pi hat matrix
pi_mat <- matrix(
  data = 0, 
  nrow = length(pi_hat), 
  ncol = length(unique(pi_hat)) - 1
)

pi_mat[cbind(1:length(pi_hat), pi_hat - 1)] <- 1
colnames(pi_mat) <- c("action-1", "action-2")
head(pi_mat)

     action-1 action-2
[1,]        0        1
[2,]        0        0
[3,]        1        0
[4,]        1        0
[5,]        1        0
[6,]        1        0

# getting the DR scores
gamma_diff <- grf::get_scores(multi.forest)[, ,]
head(gamma_diff)

          1 - 0      2 - 0
[1,] -0.4857258 -1.4015111
[2,] -3.5596292  0.1126275
[3,]  1.9553132  2.1187276
[4,] -0.6578262  1.9664855
[5,] -0.5957024 -9.2019960
[6,] -7.2408342 -7.6659957

# calculating value
mean(pi_mat * gamma_diff[-train, ])

[1] 0.5233645

So can I say, “Expected benefit per subject of the above treatment allocation would be 0.52 units and total benefit would be 1000 * 0.52 = 520 units” ??
Am I on the right track to answering my first question?
Also, How can I reach the answer to my second question?

Jan 09 '25 14:01 shafayetShafee