dplyr icon indicating copy to clipboard operation
dplyr copied to clipboard

Feature Request: Intuitive DataFrame Transposition Functionality in dplyr

Open hsalberti opened this issue 2 years ago • 1 comments

Brief description of the problem:

Transposing a DataFrame in dplyr currently requires a combination of pivot_longer() and pivot_wider() functions. This process is not as straightforward as using the base R t() function and can be confusing for users, especially those new to dplyr or coming from a base R background. An intuitive, one-step transposition function would greatly simplify the data manipulation process in dplyr.

Expected output:

A new function, possibly named transpose_df(), that allows users to transpose a DataFrame in a single step, mirroring the simplicity and ease of use of the base R t() function.


library(dplyr) library(tidyr)

Sample dataframe

df <- tibble::tibble( x = 1:3, y = 4:6, z = 7:9 )

Current method for transposing in dplyr/tidyr

transposed_df <- df %>% pivot_longer(cols = everything(), names_to = "variable", values_to = "value") %>% pivot_wider(names_from = "variable", values_from = "value")

Print the transposed dataframe

print(transposed_df)

hsalberti avatar Nov 08 '23 12:11 hsalberti

Your example doesn't work.

This works:

  • transposed_df <- data.frame(t(df),row.names = names(df))
  • or transposed_df <- df %>% t %>% data.frame(row.names = names(df))
print(transposed_df)

  X1 X2 X3
x  1  2  3
y  4  5  6
z  7  8  9

And setNames (transposed_df ,c("a","b","c")) (or setNames (transposed_df ,paste0("col",1:nrow(df))))

  a b c
x 1 2 3
y 4 5 6
z 7 8 9

  col1 col2 col3
x    1    2    3
y    4    5    6
z    7    8    9

philibe avatar Nov 08 '23 13:11 philibe

We don't believe a data frame transpose mechanism is that useful because data frames can have multiple column types, but a transpose requires all column types to be forced into the same type.

If you need a true transpose operation, typically the best and most efficient thing to do is to convert to a single type matrix and use t(). That will be way faster than anything we could do with multi type data frame columns, and is really the only well defined way to do this

DavisVaughan avatar Jul 22 '24 17:07 DavisVaughan