dtplyr icon indicating copy to clipboard operation
dtplyr copied to clipboard

Ensure code executed by dtplyr is marked as data.table aware

Open hadley opened this issue 3 years ago • 9 comments

When called from within another package, dt_eval() ends up with a root environment that's the other package's namespace (which might not import data.table). It would be nice if dtplyr always worked without intervention by the package author. But cedta() always calls topenv() so it's not clear how to tell data.table that this specific call is ok (even if other calls from the same package aren't data.table aware).

See discussion in https://github.com/ianmcook/tidyquery/issues/17

@MichaelChirico any ideas?

hadley avatar Feb 05 '21 14:02 hadley

Another option would to to set option(topLevelEnvironment) to the evaluation environment and then set .datatable.aware <- TRUE in that env. It's not clear what else might be affected by changing the topLevelEnvironment, but it is unlikely to affect code typically seen in a data manipulation pipeline.

hadley avatar Feb 05 '21 16:02 hadley

if the package is expecting some data.table functionality, defining .datatable.aware <- TRUE seems like the right way to go, no? In my mind it's like an even more dotted-line version of Suggests.

MichaelChirico avatar Feb 07 '21 02:02 MichaelChirico

@MichaelChirico but the package isn't necessarily using data.table, it's using dtplyr. dtplyr is definitely using data.table, but has no way to communicate that to data.table.

hadley avatar Feb 07 '21 15:02 hadley

But if relying on dtplyr, certainly that author is aware of data.table, so they can signal that to data.table with .datatable.aware... what you're going for is that dtplyr "just works" without needing this little kludge, right?

Another option would to to set option(topLevelEnvironment) to the evaluation environment and then set .datatable.aware <- TRUE in that env. It's not clear what else might be affected by changing the topLevelEnvironment, but it is unlikely to affect code typically seen in a data manipulation pipeline.

Could dtplyr check isNamespace in the calling environments and unlock them, then assign .datatable.aware if it's not already defined? (Possibly with warning...)

MichaelChirico avatar Feb 08 '21 07:02 MichaelChirico

@MichaelChirico right, it would be nice if dtplyr could declare this without requiring an hacks. I'd really prefer not to be writing into other people's namespaces. That said, if you don't think there's interest in resolving this on the data.table side, I can either document as a requirement for package authors or implement my own kludge.

hadley avatar Feb 08 '21 12:02 hadley

Could you file a PR citing this issue adding dtplyr here:

    "data.table" %chin% names(imports <- getNamespaceImports(ns)) ||
    "dtplyr" %chin% names(imports) ||

MichaelChirico avatar Feb 08 '21 17:02 MichaelChirico

What about also checking for .datatable.aware in the evaluation environment? Something like:

env = parent.frame(n)
if (isTRUE(env$.datatable.aware)) {
  return(TRUE)
}

ns = topenv(env)

hadley avatar Feb 08 '21 17:02 hadley

That also looks doable. I would probably set .datatable.aware=TRUE inside the data.table namespace as well since we aim for that case to be handled first.

I'm not sure all of the intricacies at play here (in fact I am not sure we even need cedta() any more), the PR would help flesh that out. Thanks!

MichaelChirico avatar Feb 10 '21 03:02 MichaelChirico

Now implemented in data.table, so can add to next version of dtplyr.

hadley avatar Jul 30 '21 14:07 hadley