cem
cem copied to clipboard
L1 of continuous variables seem to be incorrect?
When running the following sample code on simulated data:
n_trt = 1000
n_untrt = 1000
prob_trt_male = .75
prob_untrt_male = .15
n_trt_males = sum(rbinom(n_trt, 1, prob_trt_male))
n_trt_females = n_trt - n_trt_males
n_untrt_males = sum(rbinom(n_trt, 1, prob_untrt_male))
n_untrt_females = n_untrt - n_untrt_males
fake <- data.frame(trt = c(rep(0, n_untrt), rep(1, n_trt)),
sex = c(rep(0, n_untrt_females), rep(1, n_untrt_males),
rep(0, n_trt_females), rep(1, n_trt_males)))
fake$old <- NA
for (i in seq_along(fake$trt)) {
if (fake$trt[i] + fake$sex[i] == 2){
fake$old[i] <- rbinom(1, 1, .1)
fake$bp[i] <- rnorm(1, 100, 20)
} else if (fake$trt[i] + fake$sex[i] == 0) {
fake$old[i] <- rbinom(1, 1, .9)
fake$bp[i] <- rnorm(1, 280, 50)
} else {
fake$old[i] <- rbinom(1, 1, .5)
fake$bp[i] <- rnorm(1, 175, 8)
}
}
imbalance(fake$trt, fake, drop = "trt")
Returns this output:
Multivariate Imbalance Measure: L1=0.954
Percentage of local common support: LCS=13.0%
Univariate Imbalance Measures:
statistic type L1 min 25% 50% 75% max
sex -0.5850 (diff) 0.585 0.00000 0.000 -1.0000 -1.0000 0.0000
old 0.6410 (diff) 0.641 0.00000 1.000 1.0000 1.0000 0.0000
bp 143.4229 (diff) 0.000 85.02359 121.272 158.6741 153.2892 233.4842
The L1 of bp
is 0.000, which seems impossible. I can replicate it with almost any continuous variable in R, but when running this in Stata, the L1 (both univariate and multivariate) seems appropriately calculated.