margins icon indicating copy to clipboard operation
margins copied to clipboard

`at` and factors

Open vincentarelbundock opened this issue 6 years ago • 7 comments

Please specify whether your issue is about:

  • [ ] a possible bug
  • [X] a question about package functionality
  • [ ] a suggested code or documentation change, improvement to the code, or feature request

What is the proper way to specify factor values in the at list?

library(margins)
dat <- mtcars
dat$cyl <- as.factor(dat$cyl)
mod <- lm(mpg ~ cyl + hp, dat)
margins(mod, at = list(hp = 100, cyl = '6'))

vincentarelbundock avatar Feb 18 '19 12:02 vincentarelbundock

Yea, this has been a problem at times. I guess it needs to be character. We should add:

  • [ ] Documentation of correct behavior
  • [ ] Some tests (incl. when levels are numeric values as character, as in the mtcars example)

leeper avatar Feb 20 '19 10:02 leeper

Good to know. I'll try to take a look at this if/when I ever complete the plotting project.

Would you be interested in adding a bunch of input checks with something like this: https://github.com/mllg/checkmate

vincentarelbundock avatar Feb 20 '19 12:02 vincentarelbundock

The levels are also being written as numbers rather than their labels:

m <- lm(Sepal.Length ~ Sepal.Width * Species, data = iris)
> margins(m, iris, at = list(Species = c("setosa", "versicolor")))
Average marginal effects at specified values
lm(formula = Sepal.Length ~ Sepal.Width * Species, data = iris)

 at(Species) Sepal.Width Speciesversicolor
      setosa      0.6905             1.435
  versicolor      0.8651             1.435
> summary(.Last.value)
            factor Species    AME     SE       z      p  lower  upper
       Sepal.Width  1.0000 0.6905 0.1657  4.1664 0.0000 0.3657 1.0153
       Sepal.Width  2.0000 0.8651 0.2002  4.3212 0.0000 0.4727 1.2575
 Speciesversicolor  1.0000 1.4345 0.1217 11.7846 0.0000 1.1959 1.6731
 Speciesversicolor  2.0000 1.4345 0.1217 11.7846 0.0000 1.1959 1.6731

which is unintuitive.

leeper avatar Apr 10 '19 19:04 leeper

The problem is due to marginal effect(s) for a factor variable are determined partially by a way that contrasts are constructed. For example: if you estimated a model as in the example at the beginning of this thread, '4' is - let's say - internally set as a base level and this determines a set (and interpretation) of model coefficients: you have a coefficient for '6' and a coefficient for '8' describing a difference comparing to '4'. Now comes important question: what do you want to achieve setting cyl='6' in at? With regards to all the other predictors it's clear, but when it comes to cyl I may imagine two options:

  1. If you want to treat '6' as a reference level, that means you need to change a set of model coefficients. In such a case you want to have model coefficients (and respective marginal effects) for '4' and '8' (relative to '6'). Consequently you should re-parameterize model you use (with the simplest but not very efficient method of doing this being to estimate model on a data in which reference level of cyl was changed to '6' with relevel()) and compute marginal effects using that model, not the original one.
  2. If you're not interested in cyl itself (but only in effect setting it to '6' have on marginal effects of other predictors) you may accept setting marginal effects related to cyl to NAs.

tzoltak avatar Aug 07 '19 20:08 tzoltak

First problem that must be solved is that build_datalist() with specified at regarding factor variable returns this variable as a factor with only one level - this on which at points out. As a consequence marginal_effects() simply can't return results in a form that will match model coefficients.

tzoltak avatar Aug 07 '19 21:08 tzoltak

And perhaps dxdy.factor should get levels not from data but from (data that has been used in) a model? I mean, if user don't specify neither data nor at argument while calling margins() these are the same data, however otherwise (with factors being coded differently - either in terms of reference level or in terms of type of contrasts) there can occur differences in set of columns in model matrices derived from this data and from a model (with model coefficients being corresponding to columns of model matrix).

tzoltak avatar Aug 07 '19 23:08 tzoltak

Replacing code at the beginning of dydx.factor() with this below should work.

    levs <- sort(unique(find_data(model)[[variable]]))
    if (variable %in% names(attributes(data)$at)) {
        base <- factor(attributes(data)$at[[variable]], levs)
    } else {
        base <- levs[1L]
    }
    levs <- levs[levs != base]

This should work even with current behavior of build_datalist().

There is one important limitation, however: if user specifies more than one value of a factor variable in at (like: margins(mod, at = list(hp = 100, cyl = c('6', '8')))) margins() will fail at the very end of its work trying to rbind() out because different elements in out will have different sets of columns.

tzoltak avatar Aug 08 '19 07:08 tzoltak