r4ds-exercise-solutions
r4ds-exercise-solutions copied to clipboard
Exercise 12.3.3 - Untidy solution?
https://jrnold.github.io/r4ds-exercise-solutions/tidy-data.html#exercise-12.3.3
I believe there is a problem with the solution of this exercise, because the solution generates untidy data. Please correct me, if I am wrong.
In the solution for this exercise, the "people" tibble
people <- tribble(
~name, ~key, ~value,
#-----------------|--------|------
"Phillip Woods", "age", 45,
"Phillip Woods", "height", 186,
"Phillip Woods", "age", 50,
"Jessica Cordero", "age", 37,
"Jessica Cordero", "height", 156
)
is widened like this:
pivot_wider(people, names_from="name", values_from = "value")
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> # A tibble: 2 x 3
#> key `Phillip Woods` `Jessica Cordero`
#> <chr> <list> <list>
#> 1 age <dbl [2]> <dbl [1]>
#> 2 height <dbl [1]> <dbl [1]>
However, as I understand it, the resulting tibble is untidy, since the column names e.g. "Phillip Woods" are themselves variables.
Instead, I think the authors intended the pivoting to be done with names_from="key" and values_from="value", resulting in this tibble:
# A tibble: 2 x 3
name age height
<chr> <list> <list>
1 Phillip Woods <dbl [2]> <dbl [1]>
2 Jessica Cordero <dbl [1]> <dbl [1]>
In the r4ds book https://r4ds.had.co.nz/tidy-data.html the column names also seem to be updated to reflect this, as they are now called "names" and "values" instead of "key" and "value":
people <- tribble(
~name, ~names, ~values,
#-----------------|--------|------
"Phillip Woods", "age", 45,
"Phillip Woods", "height", 186,
"Phillip Woods", "age", 50,
"Jessica Cordero", "age", 37,
"Jessica Cordero", "height", 156
)