R-genomics icon indicating copy to clipboard operation
R-genomics copied to clipboard

tidyr - very important package for data analysis

Open durrantmm opened this issue 8 years ago • 2 comments

It's hard to overstate how important it is to understand what it means for data to be 'tidy'. Tidy data is an important concept if you want to make the most of many other features of R, such as ggplot2 and dplyr. An introduction to tidy data and the tidyr package can be found here.

Tidy data is generally not the way that we intuitively think about organizing data. When data is considered 'tidy', it generally follows this pattern:

  • Each column is a variable
  • Each row is an observation.

tidyr has two functions that improve upon the well known reshape2 functions cast() and melt(). The analog of cast() is called spread() and the analog of melt() is gather(). It also contains an important function called separate(), which takes a given column in R and is able to split it into multiple columns based on a delimiter. This is often necessary when a single data column contains multiple pieces of information, perhaps as a type of ID.

Learning the basics of tidy data and the tidyr package will greatly benefit our students.

durrantmm avatar Oct 06 '17 14:10 durrantmm

I support this! And I want to suggest that the example be data that needs to be plotted [with ggplot2], so there's a big payoff. I think if you've never tried to wrestle already-rectangular data for a reason the whole long/wide distinction is hard to appreciate.

mfoos avatar Oct 21 '17 01:10 mfoos

I'm new to the Carpentries, and I'm looking for a place to help. How can I help here?

gabrielodom avatar Aug 23 '18 23:08 gabrielodom