magrittr icon indicating copy to clipboard operation
magrittr copied to clipboard

`%<>%` is evaluated in wrong environment

Open klmr opened this issue 10 years ago • 30 comments

%<>% breaks when the user redefines <- because it evaluates the assignment (via <-) in the caller’s environment. This shouldn’t happen; instead, while both lhs and rhs should be evaluated in the caller’s environment, <- should be evaluated in the package’s environment (or ditched in favour of assign).

I’d submit a PR but pipe looks sufficiently complicated that I prefer not to meddle. I do believe it would be enough to replace the assignment statement with

assign(deparse(lhs), result[["value"]], envir = parent)

klmr avatar Dec 08 '14 16:12 klmr

Of all the masking one might think of, this is probably the one I thougth no one would dare touch; there would need to be a very good reason to be such a daredevil :)

In any case, you are right, and I'll think of which solution I like better, but it should be a rather simple fix..

smbache avatar Dec 09 '14 21:12 smbache

I tried a few things today to fix this, even made pushed a commit that turned out not to work. The situation is a little more complex than one might think:

  • assign won't work in situations where the lhs is e.g. x[1:10] or x$y.
  • constructing the call to use base::'<-'rather than just <- will not work either because "dispatch" to e.g. [<-, [[<-, etc, will not be done.

I am not aware of a "pretty" way of overcoming these issues, and I am not sure it is worth building in the "dispatch" manually, as this is a rare problem (who, other than you would mask <- ;))?

@hadley are you aware of a clever way around the two issues mentioned above?

smbache avatar Feb 09 '15 19:02 smbache

Ouch, this sounds painful, I hadn’t considered that.

However, I think there is only a limited number of cases that need to be considered:

  • Free assignment of the form a %<>% b
  • Function call assignment f(a) %<>% b
  • Indexer assignment a[i] %<>%, a[[i]] %<>% b or a$i %<>% b.

And all of them could be caught with a simple check of the unevaluated expression. Correct?

I … somewhat agree that there may not be priority to fix this upstream. I might pull and patch a local version in the context of the module which is overriding <-.

klmr avatar Feb 10 '15 13:02 klmr

Using @hadley's lazyeval package, I have come with an assign function that seems to work on arbitrary left-hand sides:

library(lazyeval)
library(magrittr)
doAssign <- function(lhs, rhs) {
    lhs <- lazy(lhs)
    rhs <- lazy(rhs)
    assign.expr <- str_join(deparse(lhs$expr), " <- ", deparse(rhs$expr)) %>% as.lazy(env=lhs$env)
    lazy_eval(assign.expr)
}

Basically, doAssign(LHS,RHS) should always do the same thing as LHS <- RHS. I'm sure I got some scoping subtleties wrong, but hopefully this is a reasonable starting point.

Example usage:

x <- list(list(a=5))
print(x)
doAssign(x[[1]]$a, 3) # Compare: x[[1]]$a <- 3
print(x)
doAssign(attr(x[[1]]$a, "myattr"), TRUE) # Compare: attr(x[[1]]$a, "myattr") <- TRUE
print(x)

DarwinAwardWinner avatar Feb 25 '15 21:02 DarwinAwardWinner

@DarwinAwardWinner, your version will still try to invoke <- for cases where the left-hand side is an object. It’s more or less equivalent to just using the normal lhs <- rhs assign (modulo scoping issues, as you’ve suspected).

klmr avatar Feb 25 '15 21:02 klmr

I'm not sure I understand the issue here. Are you talking about the case where <- is a generic or something? Can you give a small example where it fails? Then I could play around with it.

DarwinAwardWinner avatar Feb 25 '15 22:02 DarwinAwardWinner

Here’s an MWE:

`<-` = function (body, params) {
    vars = all.vars(substitute(params))
    formals = as.pairlist(setNames(replicate(length(vars), quote(expr = )), vars))
    eval.parent(call('function', formals, substitute(body)))
}

# Usage: 
sapply(1 : 4, x -> 2 * x)
# [1] 2 4 6 8

library(lazyeval)
library(magrittr)
library(stringr)
doAssign = function(lhs, rhs) {
    lhs = lazy(lhs)
    rhs = lazy(rhs)
    assign.expr = str_join(deparse(lhs$expr), " <- ", deparse(rhs$expr)) %>% as.lazy(env=lhs$env)
    lazy_eval(assign.expr)
}

doAssign(x, 42)
# function ()
# x

klmr avatar Feb 26 '15 08:02 klmr

@klmr why not settle for a less (but still slightly) outrageous operator overload, say using <<-, and do

sapply(1:4, x ->> 2*x)

Using <- seems like a dangerous game, even if magrittr "fixed" this issue.

smbache avatar Feb 26 '15 15:02 smbache

@smbache In a way that ship has sailed, it’s already in pervasive use. To be honest, it’s also because I’m stubborn: I find that <- is simply the perfect fit for a concise1 lambda notation in R, and the redundancy between <- and = is simply too brilliant not to be exploited.

You’re right that this is a dangerous game, of course. It’s unfortunate that getting eval scoping right in R is really hard (hence these problems). – I already submitted (and got accepted) bug fixes to two other packages, one of which is a core R package. :frowning:

I would totally understand if you didn’t want to spend time on this.


1 And, yes, “concise” is crucial here. We actually thought very hard about this. Of the numerous alternatives considered, <- was picked because three characters (e.g. a custom operator %>%) is already too long; the idea, after all, is to be substantially shorter than the function (x) … syntax. = would be even shorter of course, but syntax matters, and x = 2 * x is too cryptic. x -> 2 * x is extremely clear, more so even than x := 2 * x (another alternative).

klmr avatar Feb 26 '15 15:02 klmr

I think I figured out how to fix this. Replace

eval(call("<-", lhs, result[["value"]]), parent, parent)

with

eval(as.call(list(base::`<-`, lhs, result[["value"]])), parent, parent)

DarwinAwardWinner avatar Feb 27 '15 18:02 DarwinAwardWinner

@DarwinAwardWinner I actually tried that, see here:

https://github.com/smbache/magrittr/commit/4deda1750e447be3f822b2150336089a90e985ca

smbache avatar Mar 08 '15 14:03 smbache

What's wrong with that solution? It seemed to solve this issue for me. Does it break some other case?

DarwinAwardWinner avatar Mar 08 '15 17:03 DarwinAwardWinner

@DarwinAwardWinner Yes (but I didn't realize that either); For example, when using <- for an x[y] type LHS, it is base::'[<-' that is called and not base::'<-', and then x[1:10] %<>% add(5) won't work. Similar other examples exist with the various *<- functions. I guess <- is special that way.

smbache avatar Mar 08 '15 18:03 smbache

maybe refering to .Primitive("<-") would work.. initial try would suggest so...

smbache avatar Mar 08 '15 19:03 smbache

For me, base::'<-' and .Primitive("<-") are the same thing. That's probably why it worked for me, including the example that you gave: x[1:10] %<>% add(5), as well as every other subset assignment I tried.

DarwinAwardWinner avatar Mar 08 '15 20:03 DarwinAwardWinner

I'll admit I am having a hard time reproducing the problems I ran into when I undid the fix...

smbache avatar Mar 09 '15 18:03 smbache

By the way, I originally became interested in manipulating the assignment operator because I wanted to implement something like a "mutate" function to capture the idiom of modifying pieces of the thing being piped (but still passing the full object to the next thing int the pipeline) So for example:

... %>% { subslot(slot(.))[1:5] <- value; . } %>% ...

would become

... %>% mutate( subslot(slot(.))[1:5], value ) %>% ...

It's not shorter, but it does clearly declare what's happening. Whereas it's easy to miss the dot before the closing curly brace in the first example. (Maybe "replace" is a better name? I haven't settled on a good name yet.) Anyway, would you have any interest in merging something like that?

DarwinAwardWinner avatar Mar 12 '15 02:03 DarwinAwardWinner

Here's how I provoke an error...

change the assignment part with the following (or any of the other suggestions):

cl <- call("<-", lhs, result[["value"]])
cl[[1L]] <- base::`<-`
eval(cl, parent, parent)

and this will fail

x <- 1:10
`<-` <- function(a, b) stop("Wrong assignment used")
x[1:2] %<>% add(2)

Error message:

Error in x <- `[<-`(`*tmp*`, 1:2, value = c(3, 4)) : Wrong assignment used. 

smbache avatar Mar 20 '15 18:03 smbache

Is this an R bug, or does the assignment operator work really weirdly?

Here's a test case that triggers the same error without using magrittr that we could include in a bug report:

x = 1:10
`<-` <- function(...) stop("Wrong assignment used")
.Primitive("<-")(x[1:2], 10)

DarwinAwardWinner avatar Mar 20 '15 20:03 DarwinAwardWinner

Maybe @hadley knows a bit more; but it at least seems suspicious... and warrents at least being questioned... If you do file a bug report let me know :)

smbache avatar Mar 20 '15 20:03 smbache

I'd say that's probably technical a bug, but no one will ever fix it because it's such a weird corner case

hadley avatar Mar 20 '15 20:03 hadley

Isn't this the usual thing, the strange order of the search() path? [<- calls <-, and the one in the global env is found first, before the primitive one? I need to run now, but I'll check it later....

gaborcsardi avatar Mar 20 '15 21:03 gaborcsardi

So would that be a bug of [<- and similar assignment operators looking in .GlobalEnv for the <- function? I would think they wouldn't do that.

DarwinAwardWinner avatar Apr 29 '15 22:04 DarwinAwardWinner

I think evaluating <- in the calling environment is actually the correct behaviour. The user might have defined replacement functions and %<>% should pick up on them.

Also if you redefine <-, it makes sense that %<>% uses that definition since it's basically a shortcut for foo <- foo %>%.

lionel- avatar Jul 23 '20 08:07 lionel-

@lionel- Yes, %<>% should definitely pick up replacement functions; I’m not convinced that it should pick up redefinitions of <- itself. At the very least this is inconsistent with the rest of R. For instance, replacement functions, which also use assignment internally (repeatedly: to `*tmp*` and to the assignee variable), do not pick up redefinitions of <-.

Picking up redefinitions of <- is also problematic since both <- and = exist, and the user may be using =. Why give preference to <-? What happens if users redefine = instead?

klmr avatar Jul 23 '20 09:07 klmr

At the very least this is inconsistent with the rest of R. For instance, replacement functions, which also use assignment internally (repeatedly: to *tmp* and to the assignee variable), do not pick up redefinitions of <-.

I don't find this argument convincing. Replacement functions are called by <- and =. They don't use assignment internally, at least not in the sense of calling <- or =.

Picking up redefinitions of <- is also problematic since both <- and = exist, and the user may be using =. Why give preference to <-? What happens if users redefine = instead?

I think it's the responsibility of the owner of the lexical context of the caller to make sure <- and = are consistent. Defining %<>% as a wrapper around <- seems reasonable to me.

lionel- avatar Jul 23 '20 11:07 lionel-

Yeah, I would be happy if x %<>% y always did exactly the same as x <- x %>% y for any x and y in any context, and I'll assume responsibility for making sure x <- x %>% y does something sane.

DarwinAwardWinner avatar Jul 23 '20 11:07 DarwinAwardWinner

The documentation says "some_object %<>% foo %>% bar is equivalent to some_object <- some_object %>% foo %>% bar", if someone wants to redefine <- that's their choice.

smbache avatar Jul 23 '20 11:07 smbache

@lionel-

[Replacement functions] don't use assignment internally, at least not in the sense of calling <- or =.

I’m not sure I follow. According to the R documentation, the call f(x) <- y is literally transformed into the equivalent of

`*tmp*` <- x
x <- "f<-"(`*tmp*`, y)
rm(`*tmp*`)

There are two assignments happening in this code. And the fact that these assignments don’t call <- is exactly the point, is it not? Because I argue that, for consistency, %<>% should act the same way. That is, x %<>% y should logically be transformed into x <- x %>% y but without actually calling <-, same as in the replacement function usage.

I think it's the responsibility of the owner of the lexical context of the caller to make sure <- and = are consistent.

Speaking in the context that originally triggered this issue report: that would defeat the purpose. The issue was reported because I specifically redefined a redundant operator to perform a new, more useful action (namely, an args -> body lambda notation). Sure, it’s unorthodox but nothing forbids it, and I generally expect packages not to meddle in my local scope (and vice-versa): usual R rules allow the “owner of the lexical context” to do whatever they please within that context.

Now, from a practical perspective I find it entirely acceptable that %<>% behaves the way it does. But it’s not ideal.

klmr avatar Jul 24 '20 09:07 klmr

And the fact that these assignments don’t call <- is exactly the point, is it not?

Agreed, cf "at least not in the sense of calling <- or =".

That is, x %<>% y should logically be transformed into x <- x %>% y but without actually calling <-, same as in the replacement function usage.

The pseudo-code you mention is triggered by calling <- or = in a given lexical context, when that context uses the base definitions. This is what %<>% does as well, which makes sense to me.

The main question is whether %<>% is a wrapper around lexical <- or base <-. It is possible that you are right that the latter is better. I think the problem is that spending time to think about this is not worth it.

lionel- avatar Jul 27 '20 06:07 lionel-