cem L1 of continuous variables seem to be incorrect?

L1 of continuous variables seem to be incorrect?

Open mkiang opened this issue 9 years ago • 7 comments

When running the following sample code on simulated data:

n_trt = 1000
n_untrt = 1000
prob_trt_male = .75
prob_untrt_male = .15
n_trt_males = sum(rbinom(n_trt, 1, prob_trt_male))
n_trt_females = n_trt - n_trt_males
n_untrt_males = sum(rbinom(n_trt, 1, prob_untrt_male))
n_untrt_females = n_untrt - n_untrt_males


fake <- data.frame(trt = c(rep(0, n_untrt), rep(1, n_trt)), 
                   sex = c(rep(0, n_untrt_females), rep(1, n_untrt_males), 
                           rep(0, n_trt_females), rep(1, n_trt_males)))
fake$old <- NA
for (i in seq_along(fake$trt)) {
    if (fake$trt[i] + fake$sex[i] == 2){
        fake$old[i] <- rbinom(1, 1, .1)
        fake$bp[i] <- rnorm(1, 100, 20)
    } else if (fake$trt[i] + fake$sex[i] == 0) {
        fake$old[i] <- rbinom(1, 1, .9)
        fake$bp[i] <- rnorm(1, 280, 50)
    } else {
        fake$old[i] <- rbinom(1, 1, .5)
        fake$bp[i] <- rnorm(1, 175, 8)
    }
}

imbalance(fake$trt, fake, drop = "trt")

Returns this output:

Multivariate Imbalance Measure: L1=0.954
Percentage of local common support: LCS=13.0%

Univariate Imbalance Measures:

    statistic   type    L1      min     25%      50%      75%      max
sex   -0.5850 (diff) 0.585  0.00000   0.000  -1.0000  -1.0000   0.0000
old    0.6410 (diff) 0.641  0.00000   1.000   1.0000   1.0000   0.0000
bp   143.4229 (diff) 0.000 85.02359 121.272 158.6741 153.2892 233.4842

The L1 of bp is 0.000, which seems impossible. I can replicate it with almost any continuous variable in R, but when running this in Stata, the L1 (both univariate and multivariate) seems appropriately calculated.

Nov 05 '15 20:11 mkiang

cem cem copied to clipboard

L1 of continuous variables seem to be incorrect?

cem
cem copied to clipboard