r_intro_bc_stats
r_intro_bc_stats copied to clipboard
Some reflection on ATL version
Things that went well:
-
RStudio cloud worked flawlessly. Multiple people told me they wish this had been the way they had received their first exposure to R/RStudio because it just worked, without installation woes, or issues getting materials.
-
Quick Data Science worked really smoothly! It also nicely set up the r4ds "Applied Data Science" flowchart.
Things that I think could be improved:
-
The morning is pretty dry. Some options:
- Do Getting Started and Visualization in the morning, and Data Basics and Data Transformation in the afternoon.
- Fully integrate Getting Started and Data Basics materials into Visualization and Data Transformation, so the foundations are covered just as they are needed. As with 1. data vis in the morning and data transformation in the afternoon might work well.
-
Data transformation materials dive too deep for this audience. I'd probably do less detail on individual functions and spend more time motivating how combining them (and with ggplot) lets you answer interesting questions:
- I think the
filter()section has too much in it for this audience. Maybe cut out the combining of logical expressions. - Integrate more with previous sections, e.g. use
filter()to find out what make/model those "two-seater" cars from the Vis section are. summarise() + group_by()is something I think a lot of students can immediately see value in, but is at the very end of this section.- Make the final task something that combines just one or two dplyr verbs with a plot.
- I think the
-
Give more pointers on how to learn more. We point to the r4ds book, but what other resources should we point to:
- How to find local user groups.
- Existing online communities.
- Places/people who deliver more training like this workshop.
+1 to Rstudio cloud. Flawless.
For the data transformations section: I know mutate is important, but I'd almost prefer to teach select/filter and how to combine them because there's some nice simple ways to do that. And then focus on summarize / group_by. That's a tough call though. Also for data transformations - ask some leading questions about the data (e.g. what would you have to do with this data to answer questions about trends in life expectancy in Asia? You want to know the country with the largest population in each continent - how would you do that?) and have people think about the basic tabular data operations (subset, add columns, summarize, possibly in groups) before showing them the functions.
Related to the last data transformation point: I've never actually done this but am dying to try (someday) a "recreate this plot" exercise. You wouldn't have to do a dplyr+ggplot2 exercise that way, but if you had time, I think it would be a great capstone.
Agree with all other points. Thanks for summarizing @cwickham !
@ChristinaLK Yes! I did a “recreate this plot” type exercise in a format for slightly more advanced R users: https://github.com/cwickham/data-science-in-tidyverse/blob/master/slides/04-Case-Study.pdf And from memory it worked pretty well...it’s a good way to tie the dplyr and ggplot stuff together. Might be easier with this audience to stick with the same data as they have seen.
I agree with all the comments above. it would definitely help out the dplyr section to have it integrated with ggplot in a meaningful way.
i would say that if we did somehow rework the curriculum, I would want to introduce a data modeling section with at least lm. we discuss adding a model with ggplot with geom_smooth but we don't give any insights to how that modeling works beyond that it is just set to lm. I think you were able to understand a basic model fit and see how it is plotted, those two together are super powerful.
I would add on the previous comments, I did feel like we spent a little less time on each function with dplyr and use them in concert with previously learned things such as plotting.