cem
cem copied to clipboard
Factor Treatments in Logits
I'm writing because my co-authors and I are using the cem package in R, attempting to fit a factor variable treatment with 5 categories to a dichotomous dependent variable. (Thank you, by the way, for this amazing resource!) The cem() function runs fine, but the att() function returns an error message that we wanted to ask you about. The message reads:
Error: variable 'LatentClass' was fitted with type "factor" but type "numeric" was supplied In addition: Warning messages: 1: In eval(family$initialize) : non-integer #successes in a binomial glm! 2: In model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : variable 'LatentClass' is not a factor
We've confirmed many times in str() that our treatment is indeed a factor. Then we found this post on github claiming that this sort of model specification can't word (https://github.com/IQSS/cem/issues/2):
"When you create a cem object using a factor variable as a treatment, attempting to use att to run a logistic regression on that object fails. It looks like this is happening because of line 372 (using the GitHub formatting) of the att command, tmp.data[, obj$treatment] <- 0. Assigning 0 to the factor variable changes the variable to a numeric, and then the prd <- predict(out, tmp.data, type = "response") command on the following line fails because the treatment variable is the wrong type. To fix this, you might want to change the assignment on line 258 to assign the reference level of the factor variable if the treatment variable is a factor. Alternatively, you could just coerce everything to numeric, or throw a warning if you try to run att with a factor treatment."
Is it true that cem cannot handle factor variable treatments in a logistic model? If so, do you have a recommended course of action?
Stata also seems to struggle with factor variable treatments with more than one category. The cem command does not generate the weight variable (cem_weights). We've confirmed that when transforming the treatment variable to binary, we get the appropriate cem_weights and can run the analysis. Below is some pasted code in R and Stata:
In R:
str(matching.df) #Coarsen fatalities fat.grp <- list(c("0","1"), c("2", "3"), c("4"), c("5","6")) #Coarsen Polyarchy hist(matching.df$s_polyarchy) polycut <- c(0 , .2, .45, .8, 1)
#matching.df$LatentClass = as.numeric(as.character(matching.df$LatentClass)) str(matching.df) summary(matching.df) mat <- cem(treatment = "LatentClass", data = matching.df, grouping = list(fatalities_range=fat.grp), cutpoints = list(s_polyarchy = polycut), eval.imbalance = TRUE, drop = "Enable", baseline.group = "3") mat results <- att(mat, Enable ~ LatentClass, data = matching.df, model = "logistic")
In Stata:
*Stata can't run this command with the multilevel treatment: imbalance indiscrim fatalities_range camp_size ab_internat s_polyarchy, treatment(LatentClass)
*Stata doesn't generate weights with the multilevel treatment: recode fatalities_range (0 1 = 1) (2 3 = 2) (4 = 3) (5 6 = 4), generate(fatalities) cem indiscrim fatalities (#0) camp_size ab_internat s_polyarchy (0 , .2, .45, .8, 1), treatment(LatentClass)