tidyr
tidyr copied to clipboard
separate_wider_* functions remove row names from data frame
If you use separate_wider_* on a data frame with row names, the resulting data frame won't have row names anymore. Since documentation states that rows are not affected by this function, I suppose that this is an unwanted behavior.
library(tidyverse)
df <- data.frame(
row.names = letters[1:3],
col_to_separate = paste(LETTERS[1:3], LETTERS[1:3], sep = "-")
)
df
#> col_to_separate
#> a A-A
#> b B-B
#> c C-C
df %>%
separate_wider_delim(
col_to_separate,
delim = "-",
names = paste0("C", 1:2)
)
#> # A tibble: 3 × 2
#> C1 C2
#> <chr> <chr>
#> 1 A A
#> 2 B B
#> 3 C C
df %>%
separate_wider_position(
col_to_separate,
widths = c(C1 = 1, 1, C2 = 1)
)
#> # A tibble: 3 × 2
#> C1 C2
#> <chr> <chr>
#> 1 A A
#> 2 B B
#> 3 C C
df %>%
separate_wider_regex(
col_to_separate,
patterns = c(C1 = ".", ".", C2 = ".")
)
#> # A tibble: 3 × 2
#> C1 C2
#> <chr> <chr>
#> 1 A A
#> 2 B B
#> 3 C C
Created on 2023-05-24 with reprex v2.0.2
Somewhat more minimal reprex:
library(tidyverse)
df <- data.frame(
row.names = letters[1:3],
x = paste(LETTERS[1:3], LETTERS[1:3], sep = "-")
)
rownames(df)
#> [1] "a" "b" "c"
df |>
separate_wider_delim(x, delim = "-", names = c("a", "b")) |>
rownames()
#> [1] "1" "2" "3"
Created on 2023-11-01 with reprex v2.0.2
Looks like the root cause of this is unpack()
Seems like pack() and unpack() need a little bit of special handling of rownames with base data.frames. I see 3 distinct problems
pack()should strip rownames of the inner packed data frames, and only retain them on outer frameunpack()should probably return a data frame if input was a data frame (rare, mostly you unpack a tibble)unpack()should keep row names, particularly important if we output a base data frame due to above bullet
library(tidyverse)
df <- data.frame(
row.names = letters[1:3],
x = paste(LETTERS[1:3], LETTERS[1:3], sep = "-")
)
rownames(df)
#> [1] "a" "b" "c"
df <- tidyr::pack(df, foo = x)
# Row names on outside
rownames(df)
#> [1] "a" "b" "c"
# Row names on inside too
# (these should probably get removed)
rownames(df$foo)
#> [1] "a" "b" "c"
# - Should this return a data.frame?
# - Should this keep row names of df?
# (Probably yes to both)
tidyr::unpack(df, foo)
#> # A tibble: 3 × 1
#> x
#> <chr>
#> 1 A-A
#> 2 B-B
#> 3 C-C
Created on 2024-07-27 with reprex v2.0.2