purrr
purrr copied to clipboard
Named vector from list
I frequently want to create a named atomic vector by extracting from a homogeneous list:
- values = elements extracted via name "foo"
- names = elements extracted via name "bar"
Here's an example for specific case when target = character vector:
library(purrr)
## install_github("jennybc/repurrrsive")
library(repurrrsive)
enlist_chr <- function(l, name, value) {
nms <- map_chr(l, name)
vals <- map_chr(l, value)
set_names(vals, nms)
}
enlist_chr(got_chars[c(1, 3, 10, 29)], "name", "culture")
#> Arya Stark Catelyn Stark Theon Greyjoy Varamyr
#> "Northmen" "Rivermen" "Ironborn" "Free Folk"
Or here's a more general version, used to name the list itself based on contents.
enlist <- function(l, fname, fvalue) {
nms <- map_chr(l, as_function(fname))
vals <- map(l, as_function(fvalue))
set_names(vals, nms)
}
names(got_chars)
#> NULL
enlist(got_chars[1:4], "name", identity) %>% names()
#> [1] "Arya Stark" "Brandon Stark" "Catelyn Stark" "Eddard Stark"
Other functions in this space: enlist_*()
would produce named vector input suitable for the aspirational mapnm_*()
from #240. It's also related to tibble::enframe()
, which takes a named vector as input and returns a two-column tibble. An inverse of tibble::enframe()
has been proposed (https://github.com/tidyverse/tibble/issues/146).
This feels slightly too specific to me - I'm pretty leery about adding more functions that need type suffixes. OTOH, it feels useful, and there might be other functions in this class.
You can also write it like:
got_chars[c(1, 3, 10, 29)] %>%
set_names(map_chr(., "name")) %>%
map_chr("culture")
This seems like a special case of the general en-tibble function we talked about a long time ago. It would also be nice to able to reduce the duplication in something like this:
suppressPackageStartupMessages(library(tidyverse))
library(repurrrsive)
tibble(
name = got_chars %>% map_chr("name"),
gender = got_chars %>% map_chr("gender"),
alive = got_chars %>% map_lgl("alive"),
titles = got_chars %>% map("titles")
)
#> # A tibble: 29 × 4
#> name gender alive titles
#> <chr> <chr> <lgl> <list>
#> 1 Theon Greyjoy Male TRUE <chr [3]>
#> 2 Tyrion Lannister Male TRUE <chr [2]>
#> 3 Victarion Greyjoy Male TRUE <chr [2]>
#> 4 Will Male FALSE <NULL>
#> 5 Areo Hotah Male TRUE <chr [1]>
#> 6 Chett Male FALSE <NULL>
#> 7 Cressen Male FALSE <chr [1]>
#> 8 Arianne Martell Female TRUE <chr [1]>
#> 9 Daenerys Targaryen Female TRUE <chr [5]>
#> 10 Davos Seaworth Male TRUE <chr [4]>
#> # ... with 19 more rows
Is this connected to readr's column specifications? I.e. you take the transpose of got_chars
and specify columns and their types.
got_chars %>% transpose_tibble(cols(
name = col_character(),
gender = col_character(),
alive = col_logical(),
titles = col_list(),
))
Not a great improvement over Hadley's typed map, but I thought this connection is interesting.
In any case all these are variations of transpose. We have a list of similar objects (an array of structs in C terms), and we create a new object based on the object components (struct of arrays).
By the way, I'm thinking about implementing base-type constructors with splicing semantics in rlang. So we could overscope the contents of the list and have something like this:
transposing(got_chars,
set_names(chr(culture), chr(name))
)
Maybe that's the right way to think about it — we want a package for reading data frames from trees (both lists and xml)
@jennybc maybe move this to a mythical tidytree package?
Sure! It has begun to dawn on me that JSON --> list --> transpose is awfully close to the tibble you usually want. Especially if supplemented by the ability to promote/simplify list elements. Re: named vector, there are clearly a lot of options in this discussion.
Ah yes, so this is basically equivalent:
suppressPackageStartupMessages(library(tidyverse))
library(repurrrsive)
tr <- got_chars %>%
map(`[`, c("name", "gender", "alive", "titles")) %>%
transpose()
tibble(
name = tr$name %>% flatten_chr(),
gender = tr$gender %>% flatten_chr(),
alive = tr$alive %>% flatten_lgl(),
titles = tr$titles
)
And you could write that as
extract <- list(
name = flatten_chr,
gender = flatten_chr,
alive = flatten_lgl,
titles = identity
)
got_chars %>%
map(`[`, names(extract)) %>%
transpose() %>%
{imap(extract, function(f, i) f(.[[i]]))} %>%
as_tibble()
For cases where none of the resultant columns will be list-cols (e.g. if an API guarantees each sub-list entry is a single value), what about this approach?:
got_chars %>%
map(`[`, c("name", "gender", "alive")) %>%
map_df(as_tibble)
For fields that are variable-length, the resultant data frame can always be nest
ed or group_by
-ed to re-collapse as needed.
Is this pattern frowned upon (or possibly un-performant)?
With modern tooling, one approach would be to wrap tidyr::unchop()
:
list_transpose_df <- function(x, ..., unchop) {
ellipsis::check_dots_empty()
out <- transpose(x)
out <- as_tibble(out)
tidyr::unchop(out, {{ unchop }})
}
got_chars %>% list_transpose_df(unchop = c(name, culture, id))
#> # A tibble: 30 x 18
#> url id name gender culture born died alive titles aliases father mother spouse
#> <lis> <int> <chr> <list> <chr> <lis> <lis> <lis> <list> <list> <list> <list> <list>
#> 1 <chr… 1022 Theo… <chr … "Ironb… <chr… <chr… <lgl… <chr … <chr [… <chr … <chr … <chr …
#> 2 <chr… 1052 Tyri… <chr … "" <chr… <chr… <lgl… <chr … <chr [… <chr … <chr … <chr …
#> 3 <chr… 1074 Vict… <chr … "Ironb… <chr… <chr… <lgl… <chr … <chr [… <chr … <chr … <chr …
#> 4 <chr… 1109 Will <chr … "" <chr… <chr… <lgl… <chr … <chr [… <chr … <chr … <chr …
#> # … with 26 more rows, and 5 more variables: allegiances <list>, books <list>,
#> # povBooks <list>, tvSeries <list>, playedBy <list>
This would require explicit input from users. Alternatively, we could use vec_simplify()
from #778 to try and create atomic vectors automatically:
list_transpose_df2 <- function(x) {
out <- transpose(x)
out <- map(out, vec_simplify)
out <- as_tibble(out)
out
}
got_chars %>% list_transpose_df2()
#> # A tibble: 30 x 18
#> url id name gender culture born died alive titles aliases father mother spouse
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <lgl> <list> <list> <chr> <chr> <chr>
#> 1 http… 1022 Theo… Male "Ironb… "In … "" TRUE <chr … <chr [… "" "" ""
#> 2 http… 1052 Tyri… Male "" "In … "" TRUE <chr … <chr [… "" "" "http…
#> 3 http… 1074 Vict… Male "Ironb… "In … "" TRUE <chr … <chr [… "" "" ""
#> 4 http… 1109 Will Male "" "" "In … FALSE <chr … <chr [… "" "" ""
#> # … with 26 more rows, and 5 more variables: allegiances <list>, books <list>,
#> # povBooks <list>, tvSeries <list>, playedBy <list>
I created a package tibblify for rectangling nested lists. Also see tidyverse/tidyr/issues/835 where I explained my motivation.
I tried to follow the principles in vctrs. In case you consider it a good approach I'd be happy to include it it purrr
or tidyr
.
I think this need is now mostly resolved by the new unnest tools in tidyr and the tibblify package.