margins icon indicating copy to clipboard operation
margins copied to clipboard

Using a data.table to feed to lm breaks plot

Open jblumenau opened this issue 5 years ago • 4 comments

Hi Thomas,

Hope you are well. I noticed that the cplot function in margins doesn't appear to work when the data one passes to lm is a data.table instead of a data.frame. Perhaps you are already aware, but it's caught me out a couple of times now.

Thanks!

Jack

Please specify whether your issue is about:

  • [X ] a possible bug
  • [ ] a question about package functionality
  • [ ] a suggested code or documentation change, improvement to the code, or feature request

If you are reporting (1) a bug or (2) a question about code, please supply:

  • a fully reproducible example using a publicly available dataset (or provide your data)
  • if an error is occurring, include the output of traceback() run immediately after the error occurs
  • the output of sessionInfo()

Put your code here:

## load package
library(margins)
library(data.table)

x <- rnorm(100)
z <- sample(c(T, F), 100)
y <- rnorm(100)

mydt <- data.table(x,y,z)
mydf <- data.frame(x,y,z)

dtmod <- lm(y ~ x*z, mydt)
dfmod <- lm(y ~ x*z, mydf)

## DT data produces an error
cplot(dtmod, x="x", dx= "z", what = "effect")

> Error in dat[[dx]] : subscript out of bounds


## DF data does not produce an error
cplot(dfmod, x="x", dx = "z", what = "effect")


## session info for your system
sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.2 margins_0.3.23   

loaded via a namespace (and not attached):
[1] MASS_7.3-50      compiler_3.5.1   tools_3.5.1      yaml_2.2.0       prediction_0.3.6

jblumenau avatar Jun 14 '19 11:06 jblumenau

Note that this stems from the same underlying cause as this issue in the prediction package (on which margins depends): https://github.com/leeper/prediction/issues/35

The issue is that margins often selects variables from a data.frame by name (with the name quoted), and that doesn't always work in a data.table unless with=FALSE.

I never figured out what the proper general solution is for using margins with a data.table: in theory, data.table is meant to be a drop-in replacement for data.frame, so it's not clear whether responsibility for this lies with individual package maintainers or with data.table itself. The data.table maintainers tried to minimize this kind of problem with v1.9.8--see point 3 under "breaking changes" here: https://github.com/Rdatatable/data.table/blob/master/NEWS.0.md But they obviously haven't covered every case.

A kludge solution is to coerce any data used by margins (or by prediction) into a data.frame.

danschrage avatar Nov 19 '19 18:11 danschrage

I may solve this or I may wait to see what happens when we move plotting to ggplot. Regardless, I'll try to add tests for data.table.

leeper avatar Dec 22 '19 09:12 leeper

The easiest way to solve this is to ask cplot to coerce data.table objects to data.frame. The data object is only used internally anyway, so it doesn't matter if users lose data.table properties, and it makes it easier to select columns in a uniform way internally.

I added one line to marginsplot (my ggplot2 override of the plotting functions), and it works. The code pasted below produces this plot:

Rplot

remotes::install_github('vincentarelbundock/marginsplot')

library(margins)
library(marginsplot)
library(data.table)
x <- rnorm(100)
z <- sample(c(TRUE, FALSE), 100, replace = TRUE)
y <- rnorm(100)
mydt <- data.table(x,y,z)
dtmod <- lm(y ~ x*z, mydt)
cplot(dtmod, x="x", dx= "z", what = "effect")

vincentarelbundock avatar Mar 05 '20 23:03 vincentarelbundock

FWIW, this is where I decided to insert the coercion: https://github.com/vincentarelbundock/marginsplot/blob/master/R/cplot.R#L151

vincentarelbundock avatar Mar 05 '20 23:03 vincentarelbundock