StatsModels.jl icon indicating copy to clipboard operation
StatsModels.jl copied to clipboard

enhanced contrast coding features

Open kleinschmidt opened this issue 6 years ago • 1 comments

This is a "meta issue" to keep track of features that would be "nice to have" for contrast coding. I'll update this as other things come up.

Additional contrast types for ordinal variables

It would be useful to support some additional contrast types that are useful for ordinal variables (where the levels have a well defined order)

  • [ ] Repeated/successive differences (each contrast tests for difference between two adjacent levels; contr.sdiff from MASS)
  • [ ] Polynomial contrasts (orthogonal polynomials that test for linear, quadratic, etc. trends in the ordered levels).
  • [ ] Check correctness of helmert coding (I have a nagging suspicion that this might not be quite correctly scaled or something)

Interpretation of coefficients

The (generalized) inverse of the contrasts matrix gives the loading of each condition's mean on each coefficient, which is useful for interpretation of the coefficients in terms of the underlying cells. We should provide an interface to extract this, including sensitivity to when an intercept column or other contrasts need to be included (e.g., non-centered contrasts or the presence of interactions with other contrast-coded variables). We should also provide an interface to work with this for users, to e.g. interpret the meaning of the intercept coefficient in the presence of different contrast coding schemes.

Update: One possibility would be an analogue of R's dummy.coef which returns a named list of the "full dummy coded" coefficients for each categorical term in a model. It's sparsely documented but I've seen pointers to "Venables and Ripley (2002, p.165 ff.) or Hastie and Chambers (1992)" (from this paper)

  • [ ] Recover "dummy coded" coefficients for contrast-coded variables (a la R's dummy.coef)
  • [ ] Print the "omitted level" of a contrast (e.g., zeros for Dummy coding; the offset for the baseline for Effects coding, etc.) (printed in a different format to indicate that it's implicit)
  • [ ] Add an "interpreted view" of a coeftable or formula that says which levels or combinations of levels the, for instance, intercept corresponds to, or a main effect when there's an interaction with a categorical variable

kleinschmidt avatar Jun 21 '19 18:06 kleinschmidt

We should have an OrdinalTerm type to go with that

oxinabox avatar Jun 21 '19 22:06 oxinabox