magrittr
magrittr copied to clipboard
`%<>%` is evaluated in wrong environment
%<>% breaks when the user redefines <- because it evaluates the assignment (via <-) in the caller’s environment. This shouldn’t happen; instead, while both lhs and rhs should be evaluated in the caller’s environment, <- should be evaluated in the package’s environment (or ditched in favour of assign).
I’d submit a PR but pipe looks sufficiently complicated that I prefer not to meddle. I do believe it would be enough to replace the assignment statement with
assign(deparse(lhs), result[["value"]], envir = parent)
Of all the masking one might think of, this is probably the one I thougth no one would dare touch; there would need to be a very good reason to be such a daredevil :)
In any case, you are right, and I'll think of which solution I like better, but it should be a rather simple fix..
I tried a few things today to fix this, even made pushed a commit that turned out not to work. The situation is a little more complex than one might think:
assignwon't work in situations where the lhs is e.g.x[1:10]orx$y.- constructing the call to use
base::'<-'rather than just<-will not work either because "dispatch" to e.g.[<-,[[<-, etc, will not be done.
I am not aware of a "pretty" way of overcoming these issues, and I am not sure it is worth building in the "dispatch" manually, as this is a rare problem (who, other than you would mask <- ;))?
@hadley are you aware of a clever way around the two issues mentioned above?
Ouch, this sounds painful, I hadn’t considered that.
However, I think there is only a limited number of cases that need to be considered:
- Free assignment of the form
a %<>% b - Function call assignment
f(a) %<>% b - Indexer assignment
a[i] %<>%,a[[i]] %<>% bora$i %<>% b.
And all of them could be caught with a simple check of the unevaluated expression. Correct?
I … somewhat agree that there may not be priority to fix this upstream. I might pull and patch a local version in the context of the module which is overriding <-.
Using @hadley's lazyeval package, I have come with an assign function that seems to work on arbitrary left-hand sides:
library(lazyeval)
library(magrittr)
doAssign <- function(lhs, rhs) {
lhs <- lazy(lhs)
rhs <- lazy(rhs)
assign.expr <- str_join(deparse(lhs$expr), " <- ", deparse(rhs$expr)) %>% as.lazy(env=lhs$env)
lazy_eval(assign.expr)
}
Basically, doAssign(LHS,RHS) should always do the same thing as LHS <- RHS. I'm sure I got some scoping subtleties wrong, but hopefully this is a reasonable starting point.
Example usage:
x <- list(list(a=5))
print(x)
doAssign(x[[1]]$a, 3) # Compare: x[[1]]$a <- 3
print(x)
doAssign(attr(x[[1]]$a, "myattr"), TRUE) # Compare: attr(x[[1]]$a, "myattr") <- TRUE
print(x)
@DarwinAwardWinner, your version will still try to invoke <- for cases where the left-hand side is an object. It’s more or less equivalent to just using the normal lhs <- rhs assign (modulo scoping issues, as you’ve suspected).
I'm not sure I understand the issue here. Are you talking about the case where <- is a generic or something? Can you give a small example where it fails? Then I could play around with it.
Here’s an MWE:
`<-` = function (body, params) {
vars = all.vars(substitute(params))
formals = as.pairlist(setNames(replicate(length(vars), quote(expr = )), vars))
eval.parent(call('function', formals, substitute(body)))
}
# Usage:
sapply(1 : 4, x -> 2 * x)
# [1] 2 4 6 8
library(lazyeval)
library(magrittr)
library(stringr)
doAssign = function(lhs, rhs) {
lhs = lazy(lhs)
rhs = lazy(rhs)
assign.expr = str_join(deparse(lhs$expr), " <- ", deparse(rhs$expr)) %>% as.lazy(env=lhs$env)
lazy_eval(assign.expr)
}
doAssign(x, 42)
# function ()
# x
@klmr why not settle for a less (but still slightly) outrageous operator overload, say using <<-, and do
sapply(1:4, x ->> 2*x)
Using <- seems like a dangerous game, even if magrittr "fixed" this issue.
@smbache In a way that ship has sailed, it’s already in pervasive use. To be honest, it’s also because I’m stubborn: I find that <- is simply the perfect fit for a concise1 lambda notation in R, and the redundancy between <- and = is simply too brilliant not to be exploited.
You’re right that this is a dangerous game, of course. It’s unfortunate that getting eval scoping right in R is really hard (hence these problems). – I already submitted (and got accepted) bug fixes to two other packages, one of which is a core R package. :frowning:
I would totally understand if you didn’t want to spend time on this.
1 And, yes, “concise” is crucial here. We actually thought very hard about this. Of the numerous alternatives considered, <- was picked because three characters (e.g. a custom operator %>%) is already too long; the idea, after all, is to be substantially shorter than the function (x) … syntax. = would be even shorter of course, but syntax matters, and x = 2 * x is too cryptic. x -> 2 * x is extremely clear, more so even than x := 2 * x (another alternative).
I think I figured out how to fix this. Replace
eval(call("<-", lhs, result[["value"]]), parent, parent)
with
eval(as.call(list(base::`<-`, lhs, result[["value"]])), parent, parent)
@DarwinAwardWinner I actually tried that, see here:
https://github.com/smbache/magrittr/commit/4deda1750e447be3f822b2150336089a90e985ca
What's wrong with that solution? It seemed to solve this issue for me. Does it break some other case?
@DarwinAwardWinner Yes (but I didn't realize that either); For example, when using <- for an x[y] type LHS, it is base::'[<-' that is called and not base::'<-', and then x[1:10] %<>% add(5) won't work. Similar other examples exist with the various *<- functions. I guess <- is special that way.
maybe refering to .Primitive("<-") would work.. initial try would suggest so...
For me, base::'<-' and .Primitive("<-") are the same thing. That's probably why it worked for me, including the example that you gave: x[1:10] %<>% add(5), as well as every other subset assignment I tried.
I'll admit I am having a hard time reproducing the problems I ran into when I undid the fix...
By the way, I originally became interested in manipulating the assignment operator because I wanted to implement something like a "mutate" function to capture the idiom of modifying pieces of the thing being piped (but still passing the full object to the next thing int the pipeline) So for example:
... %>% { subslot(slot(.))[1:5] <- value; . } %>% ...
would become
... %>% mutate( subslot(slot(.))[1:5], value ) %>% ...
It's not shorter, but it does clearly declare what's happening. Whereas it's easy to miss the dot before the closing curly brace in the first example. (Maybe "replace" is a better name? I haven't settled on a good name yet.) Anyway, would you have any interest in merging something like that?
Here's how I provoke an error...
change the assignment part with the following (or any of the other suggestions):
cl <- call("<-", lhs, result[["value"]])
cl[[1L]] <- base::`<-`
eval(cl, parent, parent)
and this will fail
x <- 1:10
`<-` <- function(a, b) stop("Wrong assignment used")
x[1:2] %<>% add(2)
Error message:
Error in x <- `[<-`(`*tmp*`, 1:2, value = c(3, 4)) : Wrong assignment used.
Is this an R bug, or does the assignment operator work really weirdly?
Here's a test case that triggers the same error without using magrittr that we could include in a bug report:
x = 1:10
`<-` <- function(...) stop("Wrong assignment used")
.Primitive("<-")(x[1:2], 10)
Maybe @hadley knows a bit more; but it at least seems suspicious... and warrents at least being questioned... If you do file a bug report let me know :)
I'd say that's probably technical a bug, but no one will ever fix it because it's such a weird corner case
Isn't this the usual thing, the strange order of the search() path? [<- calls <-, and the one in the global env is found first, before the primitive one? I need to run now, but I'll check it later....
So would that be a bug of [<- and similar assignment operators looking in .GlobalEnv for the <- function? I would think they wouldn't do that.
I think evaluating <- in the calling environment is actually the correct behaviour. The user might have defined replacement functions and %<>% should pick up on them.
Also if you redefine <-, it makes sense that %<>% uses that definition since it's basically a shortcut for foo <- foo %>%.
@lionel- Yes, %<>% should definitely pick up replacement functions; I’m not convinced that it should pick up redefinitions of <- itself. At the very least this is inconsistent with the rest of R. For instance, replacement functions, which also use assignment internally (repeatedly: to `*tmp*` and to the assignee variable), do not pick up redefinitions of <-.
Picking up redefinitions of <- is also problematic since both <- and = exist, and the user may be using =. Why give preference to <-? What happens if users redefine = instead?
At the very least this is inconsistent with the rest of R. For instance, replacement functions, which also use assignment internally (repeatedly: to
*tmp*and to the assignee variable), do not pick up redefinitions of <-.
I don't find this argument convincing. Replacement functions are called by <- and =. They don't use assignment internally, at least not in the sense of calling <- or =.
Picking up redefinitions of <- is also problematic since both <- and = exist, and the user may be using =. Why give preference to <-? What happens if users redefine = instead?
I think it's the responsibility of the owner of the lexical context of the caller to make sure <- and = are consistent. Defining %<>% as a wrapper around <- seems reasonable to me.
Yeah, I would be happy if x %<>% y always did exactly the same as x <- x %>% y for any x and y in any context, and I'll assume responsibility for making sure x <- x %>% y does something sane.
The documentation says "some_object %<>% foo %>% bar is equivalent to some_object <- some_object %>% foo %>% bar", if someone wants to redefine <- that's their choice.
@lionel-
[Replacement functions] don't use assignment internally, at least not in the sense of calling
<-or=.
I’m not sure I follow. According to the R documentation, the call f(x) <- y is literally transformed into the equivalent of
`*tmp*` <- x
x <- "f<-"(`*tmp*`, y)
rm(`*tmp*`)
There are two assignments happening in this code. And the fact that these assignments don’t call <- is exactly the point, is it not? Because I argue that, for consistency, %<>% should act the same way. That is, x %<>% y should logically be transformed into x <- x %>% y but without actually calling <-, same as in the replacement function usage.
I think it's the responsibility of the owner of the lexical context of the caller to make sure
<-and=are consistent.
Speaking in the context that originally triggered this issue report: that would defeat the purpose. The issue was reported because I specifically redefined a redundant operator to perform a new, more useful action (namely, an args -> body lambda notation). Sure, it’s unorthodox but nothing forbids it, and I generally expect packages not to meddle in my local scope (and vice-versa): usual R rules allow the “owner of the lexical context” to do whatever they please within that context.
Now, from a practical perspective I find it entirely acceptable that %<>% behaves the way it does. But it’s not ideal.
And the fact that these assignments don’t call <- is exactly the point, is it not?
Agreed, cf "at least not in the sense of calling <- or =".
That is, x %<>% y should logically be transformed into x <- x %>% y but without actually calling <-, same as in the replacement function usage.
The pseudo-code you mention is triggered by calling <- or = in a given lexical context, when that context uses the base definitions. This is what %<>% does as well, which makes sense to me.
The main question is whether %<>% is a wrapper around lexical <- or base <-. It is possible that you are right that the latter is better. I think the problem is that spending time to think about this is not worth it.