margins
margins copied to clipboard
Problem with factors that have unused levels
margins()
fails in a situation when in a model there is a factor variable with some unused level(s):
## load package
library("margins")
## code goes here
mtcars$gear = factor(mtcars$gear,
c(sort(unique(mtcars$gear)),
"level not in the data"))
table(mtcars$gear)
m <- lm(hp ~ gear, mtcars)
margins(m)
## session info for your system
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
[5] LC_TIME=Polish_Poland.1250
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] margins_0.3.23
loaded via a namespace (and not attached):
[1] MASS_7.3-51.4 compiler_3.6.0 tools_3.6.0 data.table_1.12.2
[5] prediction_0.3.14
That's is because dydx()
while calling for predictions goes through all the levels of a factor no matter they occur in a data on which model was estimated or not. But in this latter situation model obviously gives no information needed to compute a prediction.
I think that simply adding droplevels()
in line 159 of a file "dydx.R" should work:
levs <- levels(droplevels(as.factor(data[[variable]])))
This should be an easy fix. Thanks for reporting!
I thought about this some more afterwards and I think, that the proper solution will be different to this proposed above: dxdy()
should get set of levels to go through from a model (ie. data stored in a model object), not from a data (even if there are some other levels in a data, model won't tell anything about what to predict for them; of course this only makes a difference if argument data
is provided to margins()
), however it should also determine reference level using information provided by the at
argument of margins()
(if a variable under consideration in a given call to dydx()
appears in at
). So instead of lines 159-161 in dydx.R should be something like this:
levs <- sort(unique(find_data(model)[[variable]]))
if (variable %in% names(attributes(data)$at)) {
base <- factor(attributes(data)$at[[variable]], levs)
} else {
base <- levs[1L]
}
levs <- levs[levs != base]