purrr map_* with user-defined functions

This is a very basic issue, but I couldn't find any guidance on it through Googling or by searching though the current issues.

It's rather confusing, at least to me, why the first map_dbl() below works, but the last one doesn't.

library(purrr)

map_dbl(1:4, sin)
#> [1]  0.8414710  0.9092974  0.1411200 -0.7568025

one <- function() {1}
map_dbl(1:4, ~ one())
#> [1] 1 1 1 1

map_dbl(1:4, one)
#> Error in .f(.x[[i]], ...): unused argument (.x[[i]])

^{Created on 2020-12-19 by the reprex package (v0.3.0)}

Dec 19 '20 22:12 behrman

This is because the one() function doesn't take any arguments, whereas the function created by ~ takes .... You can't map values to a function that doesn't take any value.

Use as_function() to examine the function created by the formula:

rlang::as_function(~ one())
#> <lambda>
#> function (..., .x = ..1, .y = ..2, . = ..1)
#> one()

By the way, community.rstudio.com or stackoverflow.com are better places for user questions.

Dec 20 '20 10:12 lionel-

Thanks for the explanation, Lionel. I had never encountered this before.

I'm writing materials for a book chapter on simulation, where functions often don't take arguments, and I'll be sure to include your explanation, so others won't get tripped up.

The reason I wrote this as an issue is that some may wish to use purrr in situations like this instead of what might have been done in base R using replicate(). From your answer, it appears as though there was a conscious decision to make this an error.

Dec 20 '20 17:12 behrman

PS Was it a conscious decision to have map_*() behave differently for the two cases I provided when the function has no arguments? There are use cases, such as simulation, where functions don't require arguments. Is there a good reason for the different behavior? I suspect that this will trip people up as it tripped me up.

Dec 20 '20 20:12 behrman

The behavior is different because you are doing different things. In the first case, map_dbl(1:4, ~ one()) is roughly the same as

for (i in 1:4)
  one()

whereas in the second case, map_dbl(1:4, one) is roughly the same as

for (i in 1:4)
  one(i)

If you instead used

one <- function(...) {1}

the result would be the same between the two cases.

when you do map(x, fun) you are passing the elements of x as an argument to fun(), which is equivalent to map(x, ~fun(.)). When you do map(x, ~ fun()) you are calling fun() for each element of x, but not actually passing the argument to fun().

Dec 22 '20 22:12 mkoohafkan

Michael, thank you for your reply.

The purpose for my issue was to raise the question of whether it might be better to internally treat map(x, fun) as map(x, ~ fun()) instead of map(x, ~ fun(.)). The latter assumes that fun() has an argument, and there are valid cases, such as in simulation, where functions need not have arguments. Are there downsides to internally using the former?

Dec 22 '20 22:12 behrman

The purpose of map() is to iterate a function over a list or vector of argument values. This is similar to the *apply() family of functions, which also fail with your example:

one <- function() {1}
lapply(1:4, one)
## Error in FUN(X[[i]], ...) : unused argument (X[[i]])

If you're just repeatedly calling a function without the need to iterate over argument values, why not just use replicate()?

replicate(4, one())

Dec 22 '20 22:12 mkoohafkan

Going back to the simulation example, the function called without arguments could return a complex object, such as a tibble. If map_*(x, fun) acted internally as map_*(x, ~ fun()), users could use the appropriate purrr function, such as map_dfr() to assemble the results of each iteration into the appropriate data structure.

Dec 22 '20 22:12 behrman

From the documentation of purrr::map():

The map functions transform their input by applying a function to each element of a list or atomic vector and returning an object of the same length as the input.

The entire purpose is to apply the function to each element of the input list or vector. map() is a replacement for lapply(), not replicate() (which is what your use case is). It makes sense that the default behavior should be to to apply the function to each element of the input list or vector. Having the default behavior ignore the input argument does not make sense.

Two obvious downsides to forcing users to explicitly map arguments to the function:

In the case where you are iterating over a single argument, but also want to provide additional arguments that are not iterated over, you can do map(x, fun, ...) where ... are the optional, non-iterated arguments. If the default behavior was to ignore the first argument, then users would have to do map(x, ~ fun(., arg1, arg2, etc.).
In the case of pmap, you can provide a dataframe to do set iterations (see mapply() for a base R implementation) and the fields are used to match to named arguments. If the default behavior was to ignore the argument, then users would have to do pmap(list(x, y, z, etc.), ~ fun(arg1 = ..1, arg2 = ..2, arg3 = ..3, etc.).

What are the downsides to using replicate() for your use case, or using ~fun()?

Dec 22 '20 23:12 mkoohafkan

I was not advocating ignoring the first argument. I was proposing that internally map(x, fun) use rlang::as_function(), as map(x, ~ fun()) does, as @lionel pointed out above. If this were used, the ... behavior and pmap_*() would work fine, just as they do with map(x, ~ fun()). Following the purrr::map() documentation, the list or atomic vector would still be mapped to the function. The function would simply have the option of ignoring it. This would be less restrictive and not more.

replicate() cannot combine data such as map_dfr(). I filed this issue with the concern that map(x, fun) has different behavior depending upon whether fun() has arguments or not, and this may trip up some users.

Dec 22 '20 23:12 behrman

What you might really want is a tidyverse equivalent of replicate, which I wouldn't mind given the awkwardness of replicate (order of arguments, long to type simplify that one almost always wants to set to FALSE, risk to forget the latter... ). You could have duplicate(), duplicate_dfr() etc.

Apr 21 '21 09:04 moodymudskipper

Even if we all agreed that this was a good idea, unfortunately making this change would be likely to break a lot of existing code, so it's not something that we can do.

Aug 24 '22 07:08 hadley

purrr purrr copied to clipboard

map_* with user-defined functions

purrr
purrr copied to clipboard