Wishlist-for-R icon indicating copy to clipboard operation
Wishlist-for-R copied to clipboard

WISH: Avoid recycling of vectors of size != 1 in arithmetic and logical operations

Open krlmlr opened this issue 4 years ago • 12 comments

From my experience, I almost always combine vectors of same lengths, or a vector and a scalar, in arithmetical and logical operations. It's very rare that I combine two vectors of size > 1. Still, recycling to the longer vector is the default setting.

In the example below, using == instead of %in% gives inconsistent results depending on the input:

which(LETTERS == c("A", "B"))
#> [1] 1 2
which(rev(LETTERS) == c("A", "B"))
#> integer(0)

Created on 2019-11-17 by the reprex package (v0.3.0)

There are many more pitfalls due to the automatic recycling.

Do we really need to recycle vectors of size != 1? If this is really desired, rep_len() does this out of the box (but doesn't warn yet if only parts of the vector are used).

What am I missing?

Proposal 2

  1. Recycling of sizes != 1 could start raising a warning that is muffled by default. Existing code won't notice, new code and library code can catch this warning and turn it to an error. Eventually this warning can be promoted to a noisier warning or error (see "Proposal 1" below).
  2. Provide a helper rep_to() that recycles the first argument to the length of the second argument, mimicking current auto-recycling behavior.

Proposal 1

If we agree that recycling does more harm than good, we could proceed as follows:

  1. Check which packages really rely on automatic recycling, perhaps by adding an environment variable in the spirit of _R_CHECK_LENGTH_1_LOGIC2_ (#48)
  2. Provide a helper rep_to() that recycles the first argument to the length of the second argument, mimicking current auto-recycling behavior
  3. Gradually turn this into a warning and then into an error, hinting at rep_to()

krlmlr avatar Nov 17 '19 18:11 krlmlr

One place where recycling is handy and natural is with matrices, e.g. X + a where X is an n-by-k matrix and a is a vector of length n.

HenrikBengtsson avatar Nov 17 '19 18:11 HenrikBengtsson

It's rarely needed, but always saw this as something nice to have.

For example, sometimes used this to select every 2nd (3rd, etc.) element or row:

letters[c(TRUE, FALSE)]
# [1] "a" "c" "e" "g" "i" "k" "m" "o" "q" "s" "u" "w" "y"

michaeldorman avatar Nov 17 '19 20:11 michaeldorman

I will admit Ive done this myself on occasion, but this is very much "clever code", and one should never write that anyway.

That said I don't think we could get rid of that, too much clever code around. The one thing, that would still bee an uphill battle but might be considered, is making it an error when the lengths don't conform (when the longer one is not a multiple of the shorter one), instead of the warning it is now. Certainly any "strict mode" a topic that has been discussed in the R community at various times, would disallow it (and likely the clever version as well when the short length isn't 1).

On Sun, Nov 17, 2019 at 12:29 PM Michael Dorman [email protected] wrote:

It's rarely needed, but always saw this as something nice to have.

For example, sometimes used this to select every 2nd (3rd, etc.) element or row:

letters[c(TRUE, FALSE)]# [1] "a" "c" "e" "g" "i" "k" "m" "o" "q" "s" "u" "w" "y"

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/HenrikBengtsson/Wishlist-for-R/issues/104?email_source=notifications&email_token=AAG53MMMARDC7E3SHAQ67CLQUGSRLA5CNFSM4JOLD6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEIVDFA#issuecomment-554783124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG53MLS3XNWFCHYUYRNTD3QUGSRLANCNFSM4JOLD6FQ .

gmbecker avatar Nov 17 '19 20:11 gmbecker

"... making it an error when the lengths don't conform (when the longer one is not a multiple of the shorter one), instead of the warning it is now."

A few months ago there was a base R commit that layed the ground for registering top-level condition handlers, i.e. allow users (and pkgs?) to subscribe to conditions such as errors and warnings, and take action. With this in place you could imagine listen for the warnings you mention and escalate them to errors. This will work better if we have better (class) annotations on our conditions, which is also something that there were some progress (first steps) when R 3.6.0 was released.

HenrikBengtsson avatar Nov 17 '19 21:11 HenrikBengtsson

I don't mind recycling along dimensions when matrices are involved. But there's also:

matrix(0, nrow = 2, ncol = 3) + 1:3
#>      [,1] [,2] [,3]
#> [1,]    1    3    2
#> [2,]    2    1    3

Created on 2019-11-17 by the reprex package (v0.3.0)


Perhaps such "clever code" could unconditionally emit a (classed) warning that is muffled by default?

krlmlr avatar Nov 17 '19 22:11 krlmlr

I think anybody who is not a complete beginner in R is aware of vector recycling, or should be. Using the wrong operator, like == when one should be using %in% (as per the tweet that started this) is a user error, comparable to using = instead of ==, or <- in a function call where one meant to use =. It's valid code with its own purpose. It's not the toolbox's fault if you use a hammer to drive a screw.

Regarding the LETTERS example above, I don't consider it inconsistent behaviour. LETTERS != rev(LETTERS) so there is no reason why LETTERS == c("A", "B") should equal rev(LETTERS) == c("A", "B").

fruce-ki avatar Nov 18 '19 09:11 fruce-ki

Using the wrong operator, like == when one should be using %in% (as per the tweet that started this) is a user error

Yes. Language designers, like other people, make poor decisions sometimes. Now the question is whether they want to fix their mistakes later, or consider them user errors.

LETTERS != rev(LETTERS so there is no reason why

In R you need to close parens, so this is syntactically invalid code. You might call it user error, I guess. R of course gives an error for this, instead of interpreting it in some smart way. Just like it should for the non-scalar recycling.

gaborcsardi avatar Nov 18 '19 09:11 gaborcsardi

@gaborcsardi There, I corrected my typo. But massive congrats on completely missing the point.

Language designers, like other people, make poor decisions sometimes. Now the question is whether they want to fix their mistakes later, or consider them user errors.

How is what I said a design error? %in% and == are two different operators that do two different things. Not knowing which operator to use or what operators do is not a design error, it's an RTFM error. With or without recycling, using == instead of %in% would have been wrong in achieving the desired result. The fact that if R was fundamentally different == would have caused an error/warning is only a coincidence. If the vectors happened to have the same length it wouldn't trigger recycling, but it would still be wrong.

I'm going to do what language parsers don't and shouldn't, that is to assume you meant something completely different that what you actually typed. ie That you meant that vector recycling was a bad design idea. In which case I tend to agree. But it is out of the designer's control, regardless of whether they want to change it or not. R needs to keep backwards compatibility and recycling is a core behaviour that people eventually learn to work with and use. Changing it would have a massive domino effect. The whole R community would have to consent to this change and commit to revising and updating their packages and code.

fruce-ki avatar Nov 18 '19 10:11 fruce-ki

That you meant that vector recycling was a bad design idea. In which case I tend to agree.

Great! Actually, recycling scalars is a good idea, recycling vectors is not.

But it is out of the designer's control, regardless of whether they want to change it or not. R needs to keep backwards compatibility and recycling is a core behaviour that people eventually learn to work with and use

That is not correct. Only a handful of people have write access to the R source code, and they can do as they please. They'll certainly not ask the whole R community about it, as they never did in the past.

Every non-patch R version has breaking changes, smaller or bigger. E.g. a recent one is the change in S3 method search: https://developer.r-project.org/Blog/public/2019/08/19/s3-method-lookup/index.html But there are many others.

gaborcsardi avatar Nov 18 '19 10:11 gaborcsardi

I've added a second, less invasive proposal to the original post. We might be able to implement this change in a fully back-compatible "opt-in" way.

@HenrikBengtsson: Am I understanding top-level condition handlers correctly? Would such a TLCH (installed by {base}) be able to muffle all warnings of a certain class?

krlmlr avatar Nov 18 '19 11:11 krlmlr

Am I understanding top-level condition handlers correctly? Would such a TLCH (installed by {base}) be able to muffle all warnings of a certain class?

In theory yes, but I think TLCH are for the user to set up.

gaborcsardi avatar Nov 18 '19 11:11 gaborcsardi

Sure, one could make scripts safer against undesired auto-recycling with a TLCH. Another use case is evaluation of expressions supplied as user input -- in this case it seems that we can do something like

result <- tryCatch(
  expr,
  base.vector_recycling = function(e) {
    stop("Recycling not supported.", call. = FALSE)
  }
)

without having to resort to TLCH.

krlmlr avatar Nov 18 '19 11:11 krlmlr