purrr icon indicating copy to clipboard operation
purrr copied to clipboard

Feature request: ability to keep/discard list elements by name

Open jnolis opened this issue 4 years ago • 9 comments

From the discussion in this twitter thread it seems there is a need to remove elements from lists by name. The current "best" solution is to assign an element NULL using base R commands, which does not have an elegant tidy piping implementation. Since this is a fairly common task that needs to be done, it would be helpful to create a purrr a function that can easily be put within a sequence of piped purrr calls:

image

One approach I have been thinking of after writing that last tweet would be to create a purrr::keep_names() and purrr::discard_names(). The point of these functions would be to closely mimic the existing purrr::keep() and purrr::discard(), but to have the functions be applied to the names of the list rather than the values. It could also work on a vectors of names as the input rather than a function, for the common case when you just want to keep/remove specific elements. So something like this:

library(purrr)

example <- as.list(1:4)
names(example) <- list("a", "b", "c", "rstudioconf_2022","cat")

> keep_names(example, c("a","b"))
$a
[1] 1

$b
[1] 2

> discard_names(example, ~ .x %in% letters)
$rstudioconf_2022
[1] 3

$cat
[1] 4

And then in the case of the Twitter thread, Elaine could have just added discard_names("b") into her code.


Things I like about adding these functions:

  1. They solve a problem I personally have as well.
  2. They seem to open the door a little bit to more functions that work on names that could be useful in purrr. For example, I also do x <- setnames(x, x) a lot at the start of my purrr piping sequences and that could have a convenience function.

This I dislike about adding these functions:

  1. They seem like they could be inefficient applying to each element independently. In the case of the discard_names example above, the anonymous function would have been called twice, but since %in% is vectorized here a single function call across the whole vector of names would have been fine to get the boolean results of the function. On a list thousands of elements long this could be a problem.
  2. The set of functions you could theoretically open to having some sort of "name" equivalent (like map_names()) is so big I do fall into some "slippery slope" fears of this going too far.

These functions seems simple enough that I would think I could personally make a PR request to add them. I would love some feedback on if other people would want them included or if they should be changed somehow. Thank you!!

jnolis avatar Jan 23 '21 16:01 jnolis

stray observation: given the correspondence between lists and data-frames, could tidyselect be useful here?

ijlyttle avatar Jan 23 '21 16:01 ijlyttle

@ijlyttle You can experiment with this unexported function which implements tidyselect over all vector inputs:

list(a = 1, b = 2, aa = 3) %>%
  tidyselect:::select(starts_with("a"))
#> $a
#> [1] 1
#>
#> $aa
#> [1] 3

c(a = 1, b = 2, aa = 3) %>%
  tidyselect:::select(starts_with("a"))
#>  a aa
#>  1  3

lionel- avatar Jan 23 '21 16:01 lionel-

This is probably more a funs:: function than a purrr:: one though.

lionel- avatar Jan 23 '21 17:01 lionel-

Honestly if tidyselect:::select could become an exported function (and perhaps renamed to avoid confusion with dplyr::select) I think that would do exactly what I was looking for, right?

jnolis avatar Jan 23 '21 17:01 jnolis

Right. This is a big design decision though.

In the meantime you can add it to your set of helper functions if you'd like to use it right away:

vec_select <- function(.x, ..., .strict = TRUE) {
  pos <- tidyselect::eval_select(quote(c(...)), .x, strict = .strict)
  rlang::set_names(.x[pos], names(pos))
}

It might be slow with long vectors. Feel free to post any feedback in an issue on the tidyselect repo.

lionel- avatar Jan 23 '21 20:01 lionel-

This is probably more a funs:: function than a purrr:: one though.

And in more complex cases, when the predicate function needs to operate both on the name and the value at the same time?

deeenes avatar Apr 23 '21 21:04 deeenes

This is probably more a funs:: function than a purrr:: one though.

I strongly agree with deeenes here.

IMHO, one could expect a rather homogenous design throughout the tidyverse, and dplyr::select() have us used to more complex cases such as:

purrr::keep(example, c("a", starts_with("c"), where(~str_detect(.x, "\\d+"))))

It would be pretty awesome if purrr::keep() could behave exactly like dplyr::select() and could use both names and predicates (and even tidyhelpers if possible).

DanChaltiel avatar Oct 04 '21 08:10 DanChaltiel

For those looking for a simple and pipable solutions, albeit only covers a simple cases but they can be modified to cover more general cases.

The easiest solution here is to use indexing and not assigning NULL to the entry, unless for some reason that is a must.

function naming can be improved (not my strongest point :) ) but I think the general gist on how to create these functions is there.

#' @param l a named list
#' @param kn a vector containing the names to keep

keep_names <- function(l, kn) {
  l[names(l) %in% kn]
}

x <- list(a = 1, b = 2, c = 3)
keep_names(x, "a")
# $a
# [1] 1

keep_names(x, c("a", "b"))
# $a
# [1] 1
# 
# $b
# [1] 2

#' @param l a named list
#' @param fn a function that will receive a list of the names. Must produce a TRUE FALSE value. Must be vectorized.

keep_names_func <- function(l, fn) {
  l[fn(names(l))]
}

x <- list(ka = 1, kb = 2, c = 3)
keep_names_func(x, function(n){startsWith(n, "k")})
# $ka
# [1] 1
# 
# $kb
# [1] 2

# Or even one with both names and value

#' @param l a named list
#' @param fn a function that will receive the list of names and value. a function that will receive a list of the names. Must produce a TRUE FALSE value. Must be vectorized.

keep_names_func_both <- function(l, fn) {
  l[fn(names(l), l)]
}

x <- list(ka = 1, kb = 2, c = 3)
keep_names_func_both(x, function(n, v){startsWith(n, "k") & v>=2})
# $kb
# [1] 2

zsigmas avatar Dec 03 '21 11:12 zsigmas

I came across purrr's modify_at() function, which seems to have everything that's needed, including tidyselect.

Two problems, though:

  1. vars(), which I understand is rlang::quos(), is not available in purrr.
  2. It doesn't seem to work.

I figured I'd put it in front of the group, using @jnolis' example:

library("purrr")

example <- as.list(1:5)
names(example) <- list("a", "b", "c", "rstudioconf_2022", "cat")

# this seems like it should work, but it doesn't
modify_at(example, rlang::quos(any_of(letters)), ~NULL)
#> $b
#> [1] 2
#> 
#> $rstudioconf_2022
#> [1] 4

# same thing - the sets should be complementary, but they aren't
modify_at(example,  rlang::quos(!any_of(letters)), ~NULL)
#> $a
#> [1] 1
#> 
#> $b
#> [1] 2
#> 
#> $c
#> [1] 3
#> 
#> $cat
#> [1] 5

Created on 2022-01-05 by the reprex package (v2.0.1)

Of course, I could be doing something wrong™️.

ijlyttle avatar Jan 05 '22 23:01 ijlyttle

I think these are interesting ideas but I don't quite see how they fit into purr. A straightfforward implementation of keep_names() and discard_names() feels a bit too simple for purrr:

discard_names <- function(.x, .p, ...) {
  sel <- .p(names(x))
  .x[!is.na(x) & !sel]
}

keep_names <- function(.x, .p, ...) {
  sel <- .p(names(x))
  .x[!is.na(x) & sel]
}

And we're currently moving away from tidyselect usage in purrr, because NSE just doesn't feel very "purrr-like".

But maybe we could make something a bit more flexible?

keep_names <- function(.x, .names, ...) {
  if (is.character(.names) {
    idx <- intersect(names(.x), .names)
  } else if (is.function(.names) || is_formula(.names)) {
    ,names <- rlang::as_function(.names)
    idx <- .names(names(x))
    
    if (is.logical(idx)) {
      idx[is.na(idx)] <- FALSE
    } else if (is.character(idx)) {
      idx <- intersect(names(.x), idx)
    } else if (!is.integer(idx)) {
      abort("If `.names` is a function, it must return an logical, integer, or character vector")
    }
    
  }
  .x[idx]
}

Then you could write x |> keep_names("foo") or x |> keep_names(~ .x %in% LETTERS) etc.

hadley avatar Aug 27 '22 22:08 hadley

That seems like a reasonable compromise to me! I'd also have the negation for discard_names, but that simple implementation covers the cases I was thinking of when I wrote this.

jnolis avatar Aug 28 '22 04:08 jnolis

Just realised that these should probably be keep_at() and discard_at(), and we should extend the same handling of names and integers to map_at(), modify_at(), etc.

hadley avatar Sep 08 '22 22:09 hadley

That makes sense to me! a map_at() that lets you programmatically rename a list seems especially convenient.

jnolis avatar Sep 09 '22 02:09 jnolis

Some progress:

keep_at <- function(.x, .names, ...) {
  if (!is_named(.x)) {
    cli::cli_abort("{.arg .x} must be named")
  }
  x_names <- names(.x)

  if (is.character(.names)) {
    idx <- intersect(.names, x_names)
  } else if (is.function(.names) || is_formula(.names)) {
    names <- rlang::as_function(.names)
    idx <- .names(x_names, ...)

    if (is.logical(idx)) {
      if (length(idx) != length(x_names)) {
        cli::cli_abort("Result of `.fun .names()` must be length {length(x_names}) not {length(idx)}.")
      }
      idx[is.na(idx)] <- FALSE
    } else if (is.character(idx)) {
      idx <- intersect(names(.x), idx)
    } else {
      cli::cli_abort("If {.arg .names} is a function, it must return a logical or character vector, not {.obj_type_friendly {idx}}.")
    }
  } else {
    names <- .names
    cli::cli_abort("{.arg .names} must be a function or a character vector, not {.obj_type_friendly {names}}.")
  }
  .x[idx]
}

@jnolis to be clear, map_at() and friends already exist and only apply the transformation to the named elements.

hadley avatar Sep 09 '22 12:09 hadley