R-ecology-lesson icon indicating copy to clipboard operation
R-ecology-lesson copied to clipboard

Additional Instructions on Pipelines

Open kilynncole opened this issue 7 years ago • 2 comments

To help students understand the process of multiple functions within a single pipeline, I recommend adding a few more operations in the ‘Pipes’ section. The lesson mentions in importance of ordering in the second Challenge. I recommend an example using the filter() and select() functions, where the variable being filtered is not included in the select() function. This provides some simple operations and modifications to reinforce what a pipe is and how it is executed.

Following the command in the ‘Manipulating Data Frames’ section:

surveys %>%
  filter(weight < 5) %>%
  select(species_id, sex, weight)

In the above code, we use the pipe to send the surveys dataset first through filter() to keep rows where weight is less than 5, then through select() to keep only the species_id, sex, and weight columns. Since %>% takes the object on its left and passes it as the first argument to the function on its right, we don’t need to explicitly include the data frame as an argument to the filter() and select() functions any more. Some may find it helpful to read the pipe like the word “then”. For instance, in the above example, we took the data frame surveys, then we filtered for rows with weight < 5, then we selected columns species_id, sex, and weight. The dplyr functions by themselves are somewhat simple, but by combining them into linear workflows with the pipe, we can accomplish more complex manipulations of data frames.

I recommend inserting the following:

In the previous example, the ordering of the functions and arguments within functions is important. Output of the selection() functions is in the same ordering that the arguments are input. For example, to have the output contain the variables ordered as “species_id, weight, sex” modify the previous command.

surveys %>%
  filter(weight < 5) %>%
  select(species_id, weight, sex)

Would the output be modified in any way if the select() function was in the pipeline before the filter() function?

surveys %>%
  filter(weight < 5) %>%
  select(species_id, weight, sex)

This does not change since the variable being filtered on is also in the selection.

The variable being filtered on, weight, is an integer vector. Next, filter on a specific species_id, which is a character vector. Then, only select the variables sex and weight.

surveys %>%
  filter(species_id==”NL”) %>%
  select(sex, weight)

Would the output be modified in any way if the select() function was in the pipeline before the filter() function?

surveys %>%
  select(sex, weight) %>%
  filter(species_id==”NL”) 

No. An error message is reported: “Error in filter_impl(.data,quo) : Evaluation error: object ‘species_id’ not found. This indicates that the ‘species_id’ variable is not in the object being passed through the pipeline.

The ‘surveys’ object is first passed into the select() function. Then, the resulting object has only the ‘sex’ and ‘weight’ variables. Then, from this the ‘species_id’ cannot be filtered because it is no longer in the object being passed through the pipeline.

Note that the final data frame is the leftmost part of this expression.

Challenge Using pipes, subset the surveys data to include individuals collected before 1995 and retain only the columns sex and weight, in that order. Answer

surveys %>%
    filter(year < 1995) %>%
    select(sex, weight)

kilynncole avatar Jul 11 '18 23:07 kilynncole

@kilynncole We appreciate your feedback on this. We will take your suggestion into consideration.

mondorescue avatar Oct 17 '18 19:10 mondorescue

@fmichonneau @aurielfournier @anacost What do you guys think? Should we add this into the "pipe" section? I am willing to do the pull request on this if it helps. I personally don't find it is necessary, since, when I teach the lesson, I almost exclusively do live coding and pretty much walk-through how to use %>% with examples and exercises similar to the ones mentioned in the recommendation by @kilynncole.

mondorescue avatar Oct 17 '18 19:10 mondorescue