tibble icon indicating copy to clipboard operation
tibble copied to clipboard

Exposing cell content in list-columns?

Open courtiol opened this issue 4 years ago • 6 comments

When displaying tibbles with list-columns, it would be nice to be able to give a glimpse of the content within each cell. For example, if the width is sufficient, instead of:

> tibble::tibble(x = list(1:2, 1:100))
# A tibble: 2 x 1
  x          
  <list>     
1 <int [2]>  
2 <int [100]>

it would great to have something like what str() produces:

> tibble::tibble(x = list(1:2, 1:100))
# A tibble: 2 x 1
  x          
  <list>     
1 int [1:2] 1 2
2 int [1:100] 1 2 3 4 5 6 7 8 9 10 ...

I guess this could be done by defining one's own class and pillar method, but I think that it would be useful for any tibble. Perhaps whether to expose the content of not could be set with a global formatting option.

A motivation is that it could play well with dplyr::summarise() when using function not outputting scalars:

> iris %>%
+   group_by(Species) %>%
+   summarise(range = list(range(Sepal.Length)),
+             quartiles = list(quantile(Sepal.Length)))
# A tibble: 3 x 3
  Species    range     quartiles
  <fct>      <list>    <list>   
1 setosa     <dbl [2]> <dbl [5]>
2 versicolor <dbl [2]> <dbl [5]>
3 virginica  <dbl [2]> <dbl [5]>

A difficulty is that any kind of content can be nested within a cell and not just vectors, but perhaps specific displays could be setup for the main class.

This is probably an issue for pillar, but the motivation is the display of tibbles.

courtiol avatar May 25 '21 17:05 courtiol

Thanks. I think a way to move forward could indeed be the creation of a custom class that applies the desired formatting. If this is useful and stable, we might incorporate a variant in pillar.

krlmlr avatar Jun 09 '21 04:06 krlmlr

I don't anything about pillar & vctrs so I don't know how stable the code below may be, but here is a simple proof of concept:

list_col <- function(x) {
  vctrs::new_vctr(x, class = "list_col")
}

formatter_list_element <- function(x, width) {
  start_txt <- "<"
  end_txt <- ">"
  ptype_txt  <-  vctrs::vec_ptype_abbr(x) # note: not working if element is not a vector (e.g. a function), do we care?
  context_text <- ifelse(length(x) > 0,
                         paste0(" [", length(x), "] ",
                                toString(x,
                                         width = width - nchar(ptype_txt) - nchar(paste0("<[]>", length(x))))),
                         "")
  paste0(start_txt, ptype_txt, context_text, end_txt)
}

format.list_col <- function(x, ..., width = 25, formater = formatter_list_element) {
  res <- purrr::map_chr(x, ~  formater(.x, width))
  format(res, justify = "left")
}

vec_ptype_abbr.list_col <- function(x) {
  "list-col"
}

pillar_shaft.list_col <- function(x, ...) {
  out <- format(x, width = 25) # how to define width?
  pillar::new_pillar_shaft_simple(out, min_width = 10) # what should min_width be?
}

## Example 1:
x <- list(1:2, TRUE, NA, NULL, 1.3, list(1, b = 2:10), matrix(1:9, nrow = 3))
     
y <- list_col(x)

tibble::tibble(x = x, y = y)
#> # A tibble: 7 x 2
#>   x                y                          
#>   <list>           <list-col>                 
#> 1 <int [2]>        <int [2] 1, 2>             
#> 2 <lgl [1]>        <lgl [1] TRUE>             
#> 3 <lgl [1]>        <lgl [1] NA>               
#> 4 <NULL>           <NULL>                     
#> 5 <dbl [1]>        <dbl [1] 1.3>              
#> 6 <named list [2]> <named list [2] 1, 2:10>   
#> 7 <int [3 × 3]>    <int[,3] [9] 1, 2, 3, ....>

# note: display could be improved:
# - in console, colors are not consistent
# - matrix dim are weird
# - named list don't show names

## Example 2:
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
iris %>%
  group_by(Species) %>%
  summarise(range = list_col(list(range(Sepal.Length))),
            quartiles = list_col(list(quantile(Sepal.Length))))
#> # A tibble: 3 x 3
#>   Species    range              quartiles                  
#>   <fct>      <list-col>         <list-col>                 
#> 1 setosa     <dbl [2] 4.3, 5.8> <dbl [5] 4.3, 4.8, 5, ....>
#> 2 versicolor <dbl [2] 4.9, 7>   <dbl [5] 4.9, 5.6, 5.9....>
#> 3 virginica  <dbl [2] 4.9, 7.9> <dbl [5] 4.9, 6.225, 6....>

Created on 2021-06-09 by the reprex package (v2.0.0)

courtiol avatar Jun 09 '21 15:06 courtiol

Nice!

@hadley: What do you think?

krlmlr avatar Jun 09 '21 16:06 krlmlr

Seems like a reasonable idea, but I'd want to see a fuller exploration of what would be displayed for types other than atomic vector.

hadley avatar Jun 14 '21 19:06 hadley

Default outputs for non-atomic vectors

Redefining formatter_list_element() above as:

formatter_list_element <- function(x, width) {
  ptype_txt  <- pillar::obj_sum(x)
  context_text <- ifelse(length(x) > 0,
                         paste0(" ", toString(x, width = width - nchar(ptype_txt) - 3L)),
                         "")
  paste0("<", ptype_txt, context_text, ">")
}

to benefit from the dimensions and ptype extracted by pillar::obj_sum(), and increasing the width in pillar_shaft.list_col() to 50 to show here more of the output,

pillar_shaft.list_col <- function(x, ...) {
  out <- format(x, width = 50)
  pillar::new_pillar_shaft_simple(out, min_width = 10)
}

we get the following for types other than atomic vectors

> x <- list(a = matrix(1:9, nrow = 3), b = array(1:27, dim = c(3, 3, 3)), c = list(z = 1, zz = list(1, 2)))
> y <- list_col(x)
> tibble::tibble(x = x, y = y)
# A tibble: 3 x 2
  x                 y                                                 
  <named list>      <list-col>                                        
1 <int [3 × 3]>     <int [3 × 3] 1, 2, 3, 4, 5, 6, 7, 8, 9>           
2 <int [3 × 3 × 3]> <int [3 × 3 × 3] 1, 2, 3, 4, 5, 6, 7, 8, 9, 1....>
3 <named list [2]>  <named list [2] 1, list(1, 2)> 

The first 2 rows are not that different from what str() (and thus glimpse()) does:

> str(x)
List of 3
 $ a: int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
 $ b: int [1:3, 1:3, 1:3] 1 2 3 4 5 6 7 8 9 10 ...
 $ c:List of 2
  ..$ z : num 1
  ..$ zz:List of 2
  .. ..$ : num 1
  .. ..$ : num 2

the list looks quite different since it is compacted into a single row for the display of the tibble. As list() are pandora's boxes, perhaps we could also opt to not reveal their guts...

For fun, I tried list of class lm:

> iris %>%
+   group_nest(Species) %>%
+   rowwise() %>%
+   summarise(lm = list(lm(Sepal.Length ~ Petal.Length, data = data))) %>%
+   mutate(lm = list_col(lm))
`summarise()` has ungrouped output. You can override using the `.groups` argument.
# A tibble: 3 x 1
  lm                                                                                                  
  <list-col>                                                                                          
1 <lm c(`(Intercept)` = 4.21316822303424, Petal.Length = 0.542292597103803), c(`1` = 0.1276221410....>
2 <lm c(`(Intercept)` = 2.40752310536045, Petal.Length = 0.828280961182994), c(`1` = 0.6995563770....>
3 <lm c(`(Intercept)` = 1.05965909090909, Petal.Length = 0.995738636363637), c(`1` = -0.734090909....>

That could certainly be improved but that shows that it should be possible to deal with various classes of non-atomic vectors.

Improved outputs via methods for toString()

If the default outputs are not good enough, perhaps we could build on the fact that toString() is a generic function. We could thus try to define specific methods for stuff that aren't atomic vectors.

For example, we could imagine a toy method for arrays as follows:

toString.array <- function(x, width = NULL, ...) {
  cols <- apply(x, 2, \(col) toString(col, width = floor(width/ncol(x))))
  toString(paste0("{", paste(cols, collapse = "}{"), "}"), width)
  }

yielding to:

# A tibble: 3 x 2
  x                 y                                                                                                                          
  <named list>      <list-col>                                                                                                                 
1 <int [3 × 3]>     <int [3 × 3] {1, 2, 3}{4, 5, 6}{7, 8, 9}>                                                                                  
2 <int [3 × 3 × 3]> <int [3 × 3 × 3] {{1, 2, 3}{10, 11, 12}{19, 20, 21}}{{4, 5, 6}{13, 14, 15}{22, 23, 24}}{{7, 8, 9}{16, 17, 18}{25, 26, 27}}>
3 <named list [2]>  <named list [2] 1, list(1, 2)> 

or

# A tibble: 3 x 2
  x                 y                                                 
  <named list>      <list-col>                                        
1 <int [3 × 3]>     <int [3 × 3] {1, 2, 3}{4, 5, 6}{7, 8, 9}>         
2 <int [3 × 3 × 3]> <int [3 × 3 × 3] {{1,.......}{{4,.......}{{7,....>
3 <named list [2]>  <named list [2] 1, list(1, 2)> 

depending on the width argument for toString().

That could certainly be improved but that shows that authors of other packages could implement their own methods for toString() for dealing with the display of their specific classes when appearing in a list-column (without the need for them to define vctrs classes).

courtiol avatar Jun 15 '21 11:06 courtiol

Thanks. I think the easiest way to start is to expand the contents only for elements where is_bare_atomic() holds, and there only to use the first three elements, and only if there's space. I'll take a look in pillar.

krlmlr avatar Aug 06 '21 04:08 krlmlr