tidyr
tidyr copied to clipboard
pivot doesn't preserve attributes
It seems most (I tested filter, select, left_join, mutate, head) dplyr functions happily (and nicely) copy over attributes with metadata. For some reason, pivot functions do not:
new_tibble(tibble(a=3,b=4),metadata="test") %>% validate_tibble() %>% attr("metadata") [1] "test" but: new_tibble(tibble(a=3,b=4),metadata="test") %>% validate_tibble() %>% pivot_longer(cols=c(a,b)) %>% attr("metadata") NULL
This may be related to the discussions about extending tibbles, but I feel redefining the class adds lots of complexity for a simple user attribute (and I honestly couldn't make much sense of github.com/tidyverse/tibble/issues/275, and it being open means it is still not clarified I guess).
Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.
library(tidyverse)
new_tibble(tibble(a=3, b=4), metadata="test") %>% validate_tibble() %>% attr("metadata") #> [1] "test"
new_tibble(tibble(a=3, b=4),metadata="test") %>%
validate_tibble() %>% pivot_longer(cols=c(a, b)) %>%
attr("metadata")
#> NULL
Somewhat more minimal reprex:
library(tidyr)
df <- tibble(a = 1)
attr(df, "metadata") <- "test"
df |>
pivot_longer(a) |>
attr("metadata")
#> NULL
Created on 2022-10-11 with reprex v2.0.2
It's not clear to me that pivot_ functions should preserve attributes — I think it's reasonable to argue that they create new data frames in a way similar to dplyr::summarise(), rather than modifying an existing data frame like dplyr::mutate().
Why not label this as a design decision (attr are not supported with dplyr)? At least that would be clear.
I don't think it's reasonable that in a pipeline I'm loosing metadata depending on what functions i'm using. I cannot really follow your argument as here I'm just reshaping the same numbers after all (but then, I would even make a case for summarize). But when I join data together, suddenly attr of one of them are there? Maybe it makes sense to you.
I'm not so sure if it adresses the same problem, but for me pivot_longer() does an amazing job at preserving attributes of variables. But I think preserving the attributes depends on supplying a class attribute when several columns are combined to one.
I'm afraid the following minimal example is not fully minimalistic, but I'll do my best:
library(tidyr)
wide <-
structure(
list(
x_1 = structure(c(1), label = "X"),
y_1 = structure(c(2), label = "Y", labels = c(A = 1), class = c("haven_labelled", "vctrs_vctr", "double")),
y_2 = structure(c(3), label = "Y", labels = c(A = 1), class = c("haven_labelled", "vctrs_vctr", "double")),
z_1 = structure(c(2), label = "Z"),
z_3 = structure(c(3), label = "Z")
),
row.names = c(NA, -1L),
class = c("tbl_df", "tbl", "data.frame"
))
long <-
wide %>%
pivot_longer(everything(), names_sep = "_", names_to=c('.value', 'wave'))
attributes(long$x)
#> $label
#> [1] "X"
attributes(long$y)
#> $label
#> [1] "Y"
#>
#> $labels
#> A
#> 1
#>
#> $class
#> [1] "haven_labelled" "vctrs_vctr" "double"
attributes(long$z)
#> NULL
attributes of "x" are preserved, because there is only one occurrence.
Attributes of "y" are preserved - both variables are merged into one, with class provided.
Attributes of "z" are lost - both variables merged into one, without class provided.