cem icon indicating copy to clipboard operation
cem copied to clipboard

L1 of continuous variables seem to be incorrect?

Open mkiang opened this issue 9 years ago • 7 comments

When running the following sample code on simulated data:

n_trt = 1000
n_untrt = 1000
prob_trt_male = .75
prob_untrt_male = .15
n_trt_males = sum(rbinom(n_trt, 1, prob_trt_male))
n_trt_females = n_trt - n_trt_males
n_untrt_males = sum(rbinom(n_trt, 1, prob_untrt_male))
n_untrt_females = n_untrt - n_untrt_males


fake <- data.frame(trt = c(rep(0, n_untrt), rep(1, n_trt)), 
                   sex = c(rep(0, n_untrt_females), rep(1, n_untrt_males), 
                           rep(0, n_trt_females), rep(1, n_trt_males)))
fake$old <- NA
for (i in seq_along(fake$trt)) {
    if (fake$trt[i] + fake$sex[i] == 2){
        fake$old[i] <- rbinom(1, 1, .1)
        fake$bp[i] <- rnorm(1, 100, 20)
    } else if (fake$trt[i] + fake$sex[i] == 0) {
        fake$old[i] <- rbinom(1, 1, .9)
        fake$bp[i] <- rnorm(1, 280, 50)
    } else {
        fake$old[i] <- rbinom(1, 1, .5)
        fake$bp[i] <- rnorm(1, 175, 8)
    }
}

imbalance(fake$trt, fake, drop = "trt")

Returns this output:

Multivariate Imbalance Measure: L1=0.954
Percentage of local common support: LCS=13.0%

Univariate Imbalance Measures:

    statistic   type    L1      min     25%      50%      75%      max
sex   -0.5850 (diff) 0.585  0.00000   0.000  -1.0000  -1.0000   0.0000
old    0.6410 (diff) 0.641  0.00000   1.000   1.0000   1.0000   0.0000
bp   143.4229 (diff) 0.000 85.02359 121.272 158.6741 153.2892 233.4842

The L1 of bp is 0.000, which seems impossible. I can replicate it with almost any continuous variable in R, but when running this in Stata, the L1 (both univariate and multivariate) seems appropriately calculated.

mkiang avatar Nov 05 '15 20:11 mkiang