purrr icon indicating copy to clipboard operation
purrr copied to clipboard

Named vector from list

Open jennybc opened this issue 7 years ago • 11 comments

I frequently want to create a named atomic vector by extracting from a homogeneous list:

  • values = elements extracted via name "foo"
  • names = elements extracted via name "bar"

Here's an example for specific case when target = character vector:

library(purrr)
## install_github("jennybc/repurrrsive")
library(repurrrsive)
enlist_chr <- function(l, name, value) {
  nms <- map_chr(l, name)
  vals <- map_chr(l, value)
  set_names(vals, nms)
}
enlist_chr(got_chars[c(1, 3, 10, 29)], "name", "culture")
#>    Arya Stark Catelyn Stark Theon Greyjoy       Varamyr 
#>    "Northmen"    "Rivermen"    "Ironborn"   "Free Folk"

Or here's a more general version, used to name the list itself based on contents.

enlist <- function(l, fname, fvalue) {
  nms <- map_chr(l, as_function(fname))
  vals <- map(l, as_function(fvalue))
  set_names(vals, nms)
}
names(got_chars)
#> NULL
enlist(got_chars[1:4], "name", identity) %>% names()
#> [1] "Arya Stark"    "Brandon Stark" "Catelyn Stark" "Eddard Stark"

Other functions in this space: enlist_*() would produce named vector input suitable for the aspirational mapnm_*() from #240. It's also related to tibble::enframe(), which takes a named vector as input and returns a two-column tibble. An inverse of tibble::enframe() has been proposed (https://github.com/tidyverse/tibble/issues/146).

jennybc avatar Oct 29 '16 23:10 jennybc

This feels slightly too specific to me - I'm pretty leery about adding more functions that need type suffixes. OTOH, it feels useful, and there might be other functions in this class.

hadley avatar Mar 05 '17 06:03 hadley

You can also write it like:

got_chars[c(1, 3, 10, 29)]  %>%
  set_names(map_chr(., "name")) %>% 
  map_chr("culture")

This seems like a special case of the general en-tibble function we talked about a long time ago. It would also be nice to able to reduce the duplication in something like this:

suppressPackageStartupMessages(library(tidyverse))
library(repurrrsive)

tibble(
  name = got_chars %>% map_chr("name"),
  gender = got_chars %>% map_chr("gender"),
  alive = got_chars %>% map_lgl("alive"),
  titles = got_chars %>% map("titles")
)
#> # A tibble: 29 × 4
#>                  name gender alive    titles
#>                 <chr>  <chr> <lgl>    <list>
#> 1       Theon Greyjoy   Male  TRUE <chr [3]>
#> 2    Tyrion Lannister   Male  TRUE <chr [2]>
#> 3   Victarion Greyjoy   Male  TRUE <chr [2]>
#> 4                Will   Male FALSE    <NULL>
#> 5          Areo Hotah   Male  TRUE <chr [1]>
#> 6               Chett   Male FALSE    <NULL>
#> 7             Cressen   Male FALSE <chr [1]>
#> 8     Arianne Martell Female  TRUE <chr [1]>
#> 9  Daenerys Targaryen Female  TRUE <chr [5]>
#> 10     Davos Seaworth   Male  TRUE <chr [4]>
#> # ... with 19 more rows

hadley avatar Mar 07 '17 02:03 hadley

Is this connected to readr's column specifications? I.e. you take the transpose of got_chars and specify columns and their types.

got_chars %>% transpose_tibble(cols(
  name = col_character(),
  gender = col_character(),
  alive = col_logical(),
  titles = col_list(),
))

Not a great improvement over Hadley's typed map, but I thought this connection is interesting.

In any case all these are variations of transpose. We have a list of similar objects (an array of structs in C terms), and we create a new object based on the object components (struct of arrays).

lionel- avatar Mar 07 '17 10:03 lionel-

By the way, I'm thinking about implementing base-type constructors with splicing semantics in rlang. So we could overscope the contents of the list and have something like this:

transposing(got_chars,
  set_names(chr(culture), chr(name))
)

lionel- avatar Mar 07 '17 10:03 lionel-

Maybe that's the right way to think about it — we want a package for reading data frames from trees (both lists and xml)

hadley avatar Mar 07 '17 17:03 hadley

@jennybc maybe move this to a mythical tidytree package?

hadley avatar Mar 08 '17 01:03 hadley

Sure! It has begun to dawn on me that JSON --> list --> transpose is awfully close to the tibble you usually want. Especially if supplemented by the ability to promote/simplify list elements. Re: named vector, there are clearly a lot of options in this discussion.

jennybc avatar Mar 08 '17 02:03 jennybc

Ah yes, so this is basically equivalent:

suppressPackageStartupMessages(library(tidyverse))
library(repurrrsive)

tr <- got_chars %>% 
  map(`[`, c("name", "gender", "alive", "titles")) %>% 
  transpose()

tibble(
  name = tr$name %>% flatten_chr(),
  gender = tr$gender %>% flatten_chr(),
  alive = tr$alive %>% flatten_lgl(),
  titles = tr$titles
)

And you could write that as

extract <- list(
  name = flatten_chr,
  gender = flatten_chr,
  alive = flatten_lgl,
  titles = identity
)
 
got_chars %>%
  map(`[`, names(extract)) %>% 
  transpose() %>% 
  {imap(extract, function(f, i) f(.[[i]]))} %>% 
  as_tibble()

hadley avatar Mar 08 '17 03:03 hadley

For cases where none of the resultant columns will be list-cols (e.g. if an API guarantees each sub-list entry is a single value), what about this approach?:

got_chars %>%
  map(`[`, c("name", "gender", "alive")) %>%
  map_df(as_tibble)

For fields that are variable-length, the resultant data frame can always be nested or group_by-ed to re-collapse as needed.

Is this pattern frowned upon (or possibly un-performant)?

mmuurr avatar Mar 26 '18 03:03 mmuurr

With modern tooling, one approach would be to wrap tidyr::unchop():

list_transpose_df <- function(x, ..., unchop) {
  ellipsis::check_dots_empty()
  out <- transpose(x)
  out <- as_tibble(out)
  tidyr::unchop(out, {{ unchop }})
}

got_chars %>% list_transpose_df(unchop = c(name, culture, id))
#> # A tibble: 30 x 18
#>   url      id name  gender culture born  died  alive titles aliases father mother spouse
#>   <lis> <int> <chr> <list> <chr>   <lis> <lis> <lis> <list> <list>  <list> <list> <list>
#> 1 <chr…  1022 Theo… <chr … "Ironb… <chr… <chr… <lgl… <chr … <chr [… <chr … <chr … <chr …
#> 2 <chr…  1052 Tyri… <chr … ""      <chr… <chr… <lgl… <chr … <chr [… <chr … <chr … <chr …
#> 3 <chr…  1074 Vict… <chr … "Ironb… <chr… <chr… <lgl… <chr … <chr [… <chr … <chr … <chr …
#> 4 <chr…  1109 Will  <chr … ""      <chr… <chr… <lgl… <chr … <chr [… <chr … <chr … <chr …
#> # … with 26 more rows, and 5 more variables: allegiances <list>, books <list>,
#> #   povBooks <list>, tvSeries <list>, playedBy <list>

This would require explicit input from users. Alternatively, we could use vec_simplify() from #778 to try and create atomic vectors automatically:

list_transpose_df2 <- function(x) {
  out <- transpose(x)
  out <- map(out, vec_simplify)
  out <- as_tibble(out)
  out
}

got_chars %>% list_transpose_df2()
#> # A tibble: 30 x 18
#>   url      id name  gender culture born  died  alive titles aliases father mother spouse
#>   <chr> <int> <chr> <chr>  <chr>   <chr> <chr> <lgl> <list> <list>  <chr>  <chr>  <chr>
#> 1 http…  1022 Theo… Male   "Ironb… "In … ""    TRUE  <chr … <chr [… ""     ""     ""
#> 2 http…  1052 Tyri… Male   ""      "In … ""    TRUE  <chr … <chr [… ""     ""     "http…
#> 3 http…  1074 Vict… Male   "Ironb… "In … ""    TRUE  <chr … <chr [… ""     ""     ""
#> 4 http…  1109 Will  Male   ""      ""    "In … FALSE <chr … <chr [… ""     ""     ""
#> # … with 26 more rows, and 5 more variables: allegiances <list>, books <list>,
#> #   povBooks <list>, tvSeries <list>, playedBy <list>

lionel- avatar Jul 31 '20 13:07 lionel-

I created a package tibblify for rectangling nested lists. Also see tidyverse/tidyr/issues/835 where I explained my motivation. I tried to follow the principles in vctrs. In case you consider it a good approach I'd be happy to include it it purrr or tidyr.

mgirlich avatar Aug 10 '20 12:08 mgirlich

I think this need is now mostly resolved by the new unnest tools in tidyr and the tibblify package.

hadley avatar Aug 24 '22 11:08 hadley