mitml icon indicating copy to clipboard operation
mitml copied to clipboard

account for colon in labels of interaction effects

Open TDJorgensen opened this issue 6 years ago • 3 comments

Hello Simon (and Alexander and Oliver),

I met you all at IMPS in Switzerland last summer, and enjoyed speaking to you all about missing-data topics. I really appreciate this package, as I was previously writing my own routines for consultation.

I noticed that specifying slopes for interaction terms in testConstraints() returned an error, and I see that the problem is that the values of slopes are assigned to objects with the names of their effects by evaluating them in a new environment. As a quick fix, I added another instance of gsub below your replacement of (Intercept) with Intercept, which replaces the colon (:) with _.x._ to indicate product terms. The colon was producing a range between two different slopes when evaluated (e.g., x:y would be evaluated as the slope of x through the slope of y rather than the x:y slope). I doubt any effects would ever be named with _.x._, so this should not cause problems. It also return it back to the familiar colon so that the label is preserved in the print() output. Adapting your help-page example:

data(studentratings)
fml <- MathDis + ReadDis + SchClimate ~ (1|ID)
imp <- panImpute(studentratings, formula=fml, n.burn=1000, n.iter=100, m=5)
implist <- mitmlComplete(imp, print=1:5)
# fit regression model with moderating effect of sex
fit.lm <- with(implist, lm(SchClimate ~ Sex*(ReadDis + MathDis)))
testEstimates(fit.lm)
# test null hypothesis of no interaction
cons <- c("SexGirl:ReadDis","SexGirl:MathDis")
testConstraints(fit.lm, constraints=cons)

I will utilize this package in an upcoming missing-data workshop at the APS conference in San Fransisco (May 27, 2018). Would it be possible to upload this version to CRAN before then? If not, I can show them how to install my version from GitHub using devtools::install_github().

Thank you! Terrence

TDJorgensen avatar Apr 19 '18 12:04 TDJorgensen

One more thought:

This fix only solves the issue of interaction terms labeled with a colon, but there are other functions that could appear in a formula. Those functions would then be applied to the values of the slopes, causing incorrect test statistics without the user realizing it (no error message). For example, if someone added a quadratic term or log of a predictor:

data(studentratings)
fml <- MathDis + ReadDis + SchClimate ~ (1|ID)
imp <- panImpute(studentratings, formula=fml, n.burn=1000, n.iter=100, m=5)
implist <- mitmlComplete(imp, print=1:5)
# fit regression model with moderating effect of sex
fit.lm1 <- with(implist, lm(SchClimate ~ ReadDis + MathDis + I(ReadDis^2) + log(MathDis)))
testEstimates(fit.lm1)
# test null hypothesis of no interaction
cons1 <- c("I(ReadDis^2)","log(MathDis)")
testConstraints(fit.lm1, constraints=cons1)
## no error, but not correct! 

That test is not correct because it is testing whether the log of the linear MathDis slope and the square of the linear ReadDis slope are zero, rather than testing whether the effects of nonlinear predictors themselves are zero. Currently, users would need to create those variables manually in the data.frame to include the effects as variables with names that can be assigned to objects:

## compare to adding squared terms to data set
implist <- within.mitml.list(implist, read2 <- ReadDis^2)
implist <- within.mitml.list(implist, logMath <- log(MathDis)
# same estimates
fit.lm2 <- with(implist, lm(SchClimate ~ ReadDis + MathDis + read2 + logMath))
testEstimates(fit.lm2)
# different test result
cons2 <- c("read2","logMath")
testConstraints(fit.lm2, constraints=cons2)

I think a potential solution would be to assign these values to a list instead of assigning them to objects in a new environment. That way, the list elements can have the same names (character strings) as the names of the coefficient (e.g., coef() output). And you can treat the list essentially as an environment using with or within. In the context of your source code, I do not know whether your call to numericDeriv() would work the same. But it might be worth trying just to see if it will ensure correct tests in the most general case. If you are too busy, I can try this on my branch when I have some more time to experiment, and make another pull request.

TDJorgensen avatar Apr 19 '18 12:04 TDJorgensen

Hi Terrence!

thanks for bringing this up! Both cases seem like bugs to me, but I think a general solution is not trivial.

Problem:

testConstraints() uses a separate environment for evaluating the specified constraints because this allows to apply arbitrary R function to the parameters for maximum flexibility. However, some things seem to cause problems:

  1. For constraints about interactions like "SexGirl:ReadDis", eval() instead evaluates "SexGirl" (i.e., the main effect of sex, dropping the part after the :) even though an object with the requested name is available.
  2. For constraints about transformed variables like "log(MathDis)" in models that also contain the main effect "MathDis", eval() instead evaluates the log of "MathDis" even though an object of the requested name is availabe.

Porting this to use lists instead of environments would not change this behavior because lists get converted to temporary environments during eval().

Problem 1) would be easy to solve by replacing : with some other character string in both the environment and the constraint. However, this would not work for 2). In addition, 2) is particularly tricky because both tests, for log(MathDis) as well as for the log of MathDis, should be possible. Both options are reasonable in that case, but R opts for the wrong thing by default.

More generally, 2) presents an ambiguity that R cannot resolve on it's own, and implementations trying to overwrite R's default behavior would be very complex and prone to error, I think.

Proposed Solution:

Both problems can be solved by wrapping the ambiguous expressions in "backticks" (aka "grave accent"). This way, eval() correctly resolves the names of the parameters mentioned in the constraints.

Example 1 (interaction effect):

fit.lm <- with(implist, lm(SchClimate ~ Sex*(ReadDis + MathDis)))

cons <- c("`SexGirl:ReadDis`","`SexGirl:MathDis`")
testConstraints(fit.lm, constraints=cons)

# Call:
# 
# testConstraints(model = fit.lm, constraints = cons)
# 
# Hypothesis test calculated from 5 imputed data sets. The following
# constraints were specified:
# 
#  `SexGirl:ReadDis`
#  `SexGirl:MathDis`
# 
# Combination method: D1 
# 
#    F.value     df1     df2   P(>F)     RIV 
#      0.296       2  51.853   0.745   0.305 
# 
# Unadjusted hypothesis test as appropriate in larger samples. 

fit.lm.null <- with(implist, lm(SchClimate ~ Sex + ReadDis + MathDis))
testModels(fit.lm, fit.lm.null)

# Call:
# 
# testModels(model = fit.lm, null.model = fit.lm.null)
# 
# Model comparison calculated from 5 imputed data sets.
# Combination method: D1 
# 
#    F.value     df1     df2   P(>F)     RIV 
#      0.296       2  51.853   0.745   0.305 
# 
# Unadjusted hypothesis test as appropriate in larger samples. 

Example 2 (log-transformed variable)

fit.lm1 <- with(implist, lm(SchClimate ~ ReadDis + MathDis + I(ReadDis^2) + log(MathDis)))
testEstimates(fit.lm1)

# Call:
#
# testEstimates(model = fit.lm1)
#
# Final parameter estimates and inferences obtained from 5 imputed data sets.
#
#               Estimate Std.Error   t.value        df   P(>|t|)       RIV       FMI
# (Intercept)      0.718     0.368     1.952    32.254     0.060     0.544     0.389
# ReadDis         -0.022     0.252    -0.085    44.746     0.932     0.427     0.328
# MathDis          0.265     0.126     2.102    38.318     0.042     0.477     0.356
# I(ReadDis^2)     0.057     0.049     1.164    32.206     0.253     0.544     0.389
# log(MathDis)     0.006     0.237     0.024    76.391     0.981     0.297     0.248
#
# Unadjusted hypothesis test as appropriate in larger samples.

testConstraints(fit.lm1, constraints="`I(ReadDis^2)`")

# Call:
#
# testConstraints(model = fit.lm1, constraints = "`I(ReadDis^2)`")
#
# Hypothesis test calculated from 5 imputed data sets. The following
# constraints were specified:
#
#  `I(ReadDis^2)`
#
# Combination method: D1
#
#    F.value     df1     df2   P(>F)     RIV
#      1.354       1  32.206   0.253   0.544
#
# Unadjusted hypothesis test as appropriate in larger samples.

testConstraints(fit.lm1, constraints="`log(MathDis)`")

# Call:
#
# testConstraints(model = fit.lm1, constraints = "`log(MathDis)`")
#
# Hypothesis test calculated from 5 imputed data sets. The following
# constraints were specified:
#
#  `log(MathDis)`
#
# Combination method: D1
#
#    F.value     df1     df2   P(>F)     RIV
#      0.001       1  76.391   0.981   0.297
#
# Unadjusted hypothesis test as appropriate in larger samples.

Conclusion:

I think that a general solution to this problem is not completely possible due to the ambiguity inherent in some contraints. The solution above is probably fine. However, I will include this in the documentation and prepare some examples, perhaps include warnings or errors for common mistakes. I would be willing to consider including your pull request for the interaction terms, though I wonder if it wouldn't be more useful to try and automatize the placement of backticks around parameter names in some cases (e.g., interactions) and make users aware of the general problem. For your workshops, you could go with the CRAN version + backticks.

What do you think?

Best wishes, Simon

simongrund1 avatar Apr 19 '18 16:04 simongrund1

I had not considered the fact that the same functions applicable to variable names in a formula could also appear in the constraints. That does complicate things. But I think the backticks are an elegant solution. I will use those in my workshop. If you plan to add this example to the help page and make a note about it there (and perhaps in your vignettes), I don't think it is necessary to accept my pull request just to replace the colons, unless you think that would be useful as a convenience (and colons would never be used as a sequence-operator in constraints). Thanks for the quick and informative reply, Terrence

TDJorgensen avatar Apr 19 '18 17:04 TDJorgensen