margins
margins copied to clipboard
Using a data.table to feed to lm breaks plot
Hi Thomas,
Hope you are well. I noticed that the cplot
function in margins doesn't appear to work when the data one passes to lm
is a data.table
instead of a data.frame
. Perhaps you are already aware, but it's caught me out a couple of times now.
Thanks!
Jack
Please specify whether your issue is about:
- [X ] a possible bug
- [ ] a question about package functionality
- [ ] a suggested code or documentation change, improvement to the code, or feature request
If you are reporting (1) a bug or (2) a question about code, please supply:
- a fully reproducible example using a publicly available dataset (or provide your data)
- if an error is occurring, include the output of
traceback()
run immediately after the error occurs - the output of
sessionInfo()
Put your code here:
## load package
library(margins)
library(data.table)
x <- rnorm(100)
z <- sample(c(T, F), 100)
y <- rnorm(100)
mydt <- data.table(x,y,z)
mydf <- data.frame(x,y,z)
dtmod <- lm(y ~ x*z, mydt)
dfmod <- lm(y ~ x*z, mydf)
## DT data produces an error
cplot(dtmod, x="x", dx= "z", what = "effect")
> Error in dat[[dx]] : subscript out of bounds
## DF data does not produce an error
cplot(dfmod, x="x", dx = "z", what = "effect")
## session info for your system
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.12.2 margins_0.3.23
loaded via a namespace (and not attached):
[1] MASS_7.3-50 compiler_3.5.1 tools_3.5.1 yaml_2.2.0 prediction_0.3.6
Note that this stems from the same underlying cause as this issue in the prediction
package (on which margins
depends):
https://github.com/leeper/prediction/issues/35
The issue is that margins
often selects variables from a data.frame
by name (with the name quoted), and that doesn't always work in a data.table
unless with=FALSE
.
I never figured out what the proper general solution is for using margins
with a data.table
: in theory, data.table
is meant to be a drop-in replacement for data.frame
, so it's not clear whether responsibility for this lies with individual package maintainers or with data.table
itself. The data.table
maintainers tried to minimize this kind of problem with v1.9.8--see point 3 under "breaking changes" here: https://github.com/Rdatatable/data.table/blob/master/NEWS.0.md
But they obviously haven't covered every case.
A kludge solution is to coerce any data used by margins
(or by prediction
) into a data.frame
.
I may solve this or I may wait to see what happens when we move plotting to ggplot. Regardless, I'll try to add tests for data.table.
The easiest way to solve this is to ask cplot
to coerce data.table
objects to data.frame
. The data
object is only used internally anyway, so it doesn't matter if users lose data.table
properties, and it makes it easier to select columns in a uniform way internally.
I added one line to marginsplot
(my ggplot2 override of the plotting functions), and it works. The code pasted below produces this plot:
remotes::install_github('vincentarelbundock/marginsplot')
library(margins)
library(marginsplot)
library(data.table)
x <- rnorm(100)
z <- sample(c(TRUE, FALSE), 100, replace = TRUE)
y <- rnorm(100)
mydt <- data.table(x,y,z)
dtmod <- lm(y ~ x*z, mydt)
cplot(dtmod, x="x", dx= "z", what = "effect")
FWIW, this is where I decided to insert the coercion: https://github.com/vincentarelbundock/marginsplot/blob/master/R/cplot.R#L151