tidyr icon indicating copy to clipboard operation
tidyr copied to clipboard

separate_wider_* functions remove row names from data frame

Open MarcoBruttini opened this issue 2 years ago • 2 comments

If you use separate_wider_* on a data frame with row names, the resulting data frame won't have row names anymore. Since documentation states that rows are not affected by this function, I suppose that this is an unwanted behavior.

library(tidyverse)

df <- data.frame(
  row.names = letters[1:3],
  col_to_separate = paste(LETTERS[1:3], LETTERS[1:3], sep = "-")
)

df
#>   col_to_separate
#> a             A-A
#> b             B-B
#> c             C-C

df %>%
  separate_wider_delim(
    col_to_separate,
    delim = "-",
    names = paste0("C", 1:2)
  )
#> # A tibble: 3 × 2
#>   C1    C2   
#>   <chr> <chr>
#> 1 A     A    
#> 2 B     B    
#> 3 C     C

df %>%
  separate_wider_position(
    col_to_separate,
    widths = c(C1 = 1, 1, C2 = 1)
  )
#> # A tibble: 3 × 2
#>   C1    C2   
#>   <chr> <chr>
#> 1 A     A    
#> 2 B     B    
#> 3 C     C

df %>%
  separate_wider_regex(
    col_to_separate,
    patterns = c(C1 = ".", ".", C2 = ".")
  )
#> # A tibble: 3 × 2
#>   C1    C2   
#>   <chr> <chr>
#> 1 A     A    
#> 2 B     B    
#> 3 C     C

Created on 2023-05-24 with reprex v2.0.2

MarcoBruttini avatar May 24 '23 14:05 MarcoBruttini

Somewhat more minimal reprex:

library(tidyverse)

df <- data.frame(
  row.names = letters[1:3],
  x = paste(LETTERS[1:3], LETTERS[1:3], sep = "-")
)
rownames(df)
#> [1] "a" "b" "c"

df |> 
  separate_wider_delim(x, delim = "-", names = c("a", "b")) |> 
  rownames()
#> [1] "1" "2" "3"

Created on 2023-11-01 with reprex v2.0.2

Looks like the root cause of this is unpack()

hadley avatar Nov 01 '23 19:11 hadley

Seems like pack() and unpack() need a little bit of special handling of rownames with base data.frames. I see 3 distinct problems

  • pack() should strip rownames of the inner packed data frames, and only retain them on outer frame
  • unpack() should probably return a data frame if input was a data frame (rare, mostly you unpack a tibble)
  • unpack() should keep row names, particularly important if we output a base data frame due to above bullet
library(tidyverse)

df <- data.frame(
  row.names = letters[1:3],
  x = paste(LETTERS[1:3], LETTERS[1:3], sep = "-")
)
rownames(df)
#> [1] "a" "b" "c"

df <- tidyr::pack(df, foo = x)

# Row names on outside
rownames(df)
#> [1] "a" "b" "c"

# Row names on inside too
# (these should probably get removed)
rownames(df$foo)
#> [1] "a" "b" "c"

# - Should this return a data.frame?
# - Should this keep row names of df?
# (Probably yes to both)
tidyr::unpack(df, foo)
#> # A tibble: 3 × 1
#>   x    
#>   <chr>
#> 1 A-A  
#> 2 B-B  
#> 3 C-C

Created on 2024-07-27 with reprex v2.0.2

DavisVaughan avatar Jul 27 '24 13:07 DavisVaughan